Microsoft R on Apache Spark coming to Hadoop Summit

Vu Anh Nguyen

Microsoft has just published a new post on the company’s TechNet blog to announce support for open source cluster computing framework Apache Spark in Microsoft R Server for Hadoop, as part of the company’s participation in the Hadoop Summit.

Specifically, users can now run R functions over Spark nodes to train models on data “1000 times larger than before”, at 125 times the speed of running open source R with CRAN algorithms, due to R Server and Spark’s combined power of parallelized algorithms and Spark’s in-memory architecture. Additionally, Microsoft R Client has also been announced, providing a free R client for data scientists to use R functions to analyze data, both on local workstations and throughout Microsoft R Server. The news follow Microsoft’s statement of “extensive commitment” to Apache Spark, as well as the company’s renewed vigor in open-source.

Microsoft also announces major architecture overhauls to DeployR – part of Microsoft R Server that provides analytics as web services – with improvements including more choices of supported repository databases, more Web and installation security features, and improved Security Policy Management. Previously, Microsoft also announced R Server for HDInsight, bringing the ability to do predictive modeling and machine learning on Spark to Azure. For more information, head to Microsoft’s booth at Hadoop Summit in San Jose today, and tune in to the keynote by Joseph Sirosh, corporate vice president at Microsoft.