Spark is a popular solution for in-memory big-data grids. R is the preferred statistical programming language. Since Machine Learning is mostly a powerful statistical & BI processing of tons of data – combining the two – Spark and R – is foreseen.

This seminar introduces both Spark and R and focuses on how the two can be used for powerful ML implementations via SparkR.

  • Introduction to Spark
  • Introduction to R & RStudio
  • SparkR
    • Connecting R module to Spark cluster
    • Configuring DataSource and setting SparkDataFrames
    • SparkDataFrames
      • Selecting
      • Grouping & aggregating
      • Functions & custom functions
  • Machine Learning
    • MLlib – Introduction
    • SparkR operations
      • Supported ML Algorithms
      • Summaries
      • Predict
      • Write.ml & read.ml

Info

  • Seminar ID: 44044
  • Location: Daniel Hotel
  • Date: Thursday 22nd of June 2017

Main Speaker