Hadoop is a Apache project and has been developped by Yahoo based on the whitepaper from Google. It was designed to be a distributed and scalable system. As we can see in the next figure , Hadoop is composed of several softwares that are part of the solution and give the possibility to store and process the data in a distributed way relatively easily. I’m going to introduce the most important parts of Hadoop that make it work.
It was bassicaly composed by a distributed files system (HDFS) and a frame to distributed in a easy way the processing accros many servers.
Here are several articles that explain differents part of Hadoop:
There are differents distributions of Hadoop as a Linux distributions that are completly packaged and more or less automated for the installation.