Linux Hadoop Cloudera CDH3
What is Hadoop?Hadoop is a free framework written in Java that facilitates the writing of distributed applications.
Some components of Hadoop are extremely popular today:
- MapReduce, an algorithm which allows the alignment of a task or calculation over large amounts of data;
- HBase, a distributed database for large data volumes;
- HDFS,the distributed file system.
One peculiarity of Hadoop is its ability to operate even if several nodes in the cluster are faulty.
Why the Cloudera distribution?The Cloudera company is now a reference in the Hadoop world and a major contributor. We offer the above CDH3 version of Ubuntu 10.04 (64bit) using: HDFS, MapReduce, HBAse, Hive, Zookeeper, Hue.
What are the characteristics of the 3 versions offered by OVH?
- 'Pseudo-distributed' Mode: it's a version for testing and development. All Hadoop bricks are collected on a single machine.
- 'Master' Mode: in a Hadoop cluster, you must have a 'Master' server that shall be responsible for managing your cluster. 'The Master' has the roles of 'JobTracker' for MapReduce and 'namenode' for HDFS.
- 'Slave' Mode: for all nodes in your cluster to perform the calculations ('TaskTracker') and contain data ('datanode').
Easy to use
|Access to a server||Email service|
|FTP (Port 21)||-||POP3 (Port 110)||-|
|SSH (Port 22)||-||IMAP (Port 143)||-|
|TSE (Port 3389)||-||SMTP (Port 25)||-|
|Web (Port 80)||-||My SQL||-|
|Named (Port 53)||-||PHP||-|