When we’re thinking about Big Data solutions, we usually think about Hadoop. The Hadoop framework is based on the programming model called MapReduce, and it enables a computing solution that is scalable, flexible, fault-tolerant and cost effective.  While MapReduce gives us all that, it is still a little complicated to write, harder to use and maintain, and sometimes slow to run.

Spark is an open source alternative to MapReduce – designed to make it easier to build and run fast and sophisticated applications on Hadoop or without it. It was introduced by Apache Software Foundation as a replacement to the Map Reduce framework and it is now considered one of the best solutions for MPP processing.

Spark’s main feature is its in-memory cluster computing that increases the processing speed of an application. It performs at speeds of up to 100 times faster than Map Reduce for iterative algorithms or interactive data mining and with much less code. Spark is also very flexible, and supports Java, Scala, and Python APIs for ease of development. It also have on-line shells for testing and debugging.

In this session, we will learn what is Spark, how to utilize it, and how to integrate it with the rest of our Big Data solutions. We will explore some code samples, and show usages of data science in the real world.

The seminar is designed for developers, team leaders, data scientists and CDOs.

The session includes some code samples so programming background might be required.

Topics:

  • Big data, Hadoop, and MapReduce Introduction
  • Introducing Spark: a fast and general engine for large-scale data processing
  • Why and when we use Spark
  • How to use Spark:
    • Working with RDDs
    • Functional Programming with Scala, Python, and Java
    • Parallel Programming with Spark
    • Writing Spark Applications
    • Using Spark in Clusters
  • Spark Modules:
    • Spark Streaming
    • Spark SQL
    • MLib
    • GraphX
  • Spark & Data Science in the real world
    • Data science in a nutshell
    • Hands-on machine learning example with Spark & iPython Notebook
    • Spark for classification and clustering
    • Machine learning on web-scale graphs with Spark

Info

  • Seminar ID: 44012
  • Location: Daniel Hotel
  • Date: Monday 19th of June 2017

Main Speaker

<a href='http://devgeekweek.jbh.co.il/speaker/avi-zimroni/'>Avi Zimroni</a>

מומחה big-data , בעל ניסיון רב בתחום בסיסי נתונים גדולים, כריית מידע  ו-data science.