When we’re thinking about Big Data solutions, we usually think about Hadoop. Apache Hadoop is an open-source distributed fault-tolerant system that leverages commodity hardware to achieve large-scale agile data storage and processing. Hadoop empowers applications to work with a large number of nodes, and large-scale storage without exposing the complexity of clustering to the end user.

In this seminar, we will discuss the design principles behind Apache Hadoop and explain the architecture of its core sub-systems: HDFS and MapReduce. We will then dive into the Hadoop eco-system and other projects that relate to Hadoop big data solution.

This seminar is for application developers, team leaders, data scientists, and architects who want to understand Hadoop’s architecture and its related projects.

  • The Big Data challenges
    • Introducing Hadoop and Hadoop Core Technologies
    • Hadoop Core
    • HDFS
    • MapReduce
    • YARN and MapReduce 2
  • Hadoop distributions: Cloudera, Hortonworks, MapR
  • Extending the Hadoop Eco-system with Apache Top Level Projects:
    • Scripting and ETL using Pig
    • Data Stores using HBase, Hive, Parquet
    • Data Management using HCatalog, Avro
    • Data Flows and Integration using Sqoop and Flume
  • Operation, Coordination,and the management of the Hadoop Cluster using ZooKeeper, and Hue
  • Outside the Hadoop Eco-system:
    • Apache Spark
    • The NoSQL world and Hadoop


  • Seminar ID: 44013
  • Location: Daniel Hotel
  • Date: Tuesday 20th of June 2017

Main Speaker

<a href='http://devgeekweek.jbh.co.il/speaker/ram-kedem/'>Ram Kedem</a>

ראש צוות Data ו – BI בחברת Edge226. רם מביא עימו נסיון של עשור בעבודה מול פלטפורמות Database שונות, וידע …