Hadoop is a Apache project and has been developped by Yahoo based on the whitepaper from Google. It was designed to be a distributed and scalable system. As we can see in the next figure , Hadoop is composed of several softwares that are part of the solution and give the possibility to store and process the data in a distributed way relatively easily. I’m going to introduce the most important parts of Hadoop that make it work.

Hadoop Ecosystem
Hadoop Ecosystem

It was bassicaly composed by a distributed files system (HDFS) and a frame to distributed in a easy way the processing accros many servers.

Here are several articles that explain differents part of Hadoop:

There are differents distributions of Hadoop as a Linux distributions that are completly packaged and more or less automated for the installation.

One thought on “Hadoop

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.