Hadoop MapReduce
MapReduce is the heart of Hadoop. It is a programming model. They associate the implementation of process and generate large data sets in a parallel manner. MapReduce is a framework for the process of parallelizable. They across large datasets using a large number of computers (nodes). But, referred to as a cluster or a grid manner. Process data stored either in a file system (unstructured) or in a database (structured). It can take advantage of the locality of data, processing it near the place is store to reduce the distance.
How MapReduce Works
MapReduce job splits into a large data-set into independent chunks of data. They organize them into a key, value pairs for parallel process manner. It improves the speed and reliability of the cluster solutions in greater reliability. The Map function divides into the Input Format and creates a map task for each range in the input. The Job Tracker distributes those tasks to the worker nodes. The output of each map task spilled into a group of key-value pairs for each reducer. The Reduce function then collects the various results and combines them. A process of large data is a master node needs to solve. Each reducer pulls the relevant partition from the machines. Where the maps executed, and then write its output back into HDFS. Thus, the reduce is able to collect the data from all the maps for the keys and combine them to solve the problem.
Word count
Mapper: It maps input key & value pairs to a set of intermediate key and value pairs. Reducer: It reduces a set of intermediate values which share a key to a smaller set of values. The wordcount MapReduce program, we provide input file of any text file, as input. When the MapReduce program starts in processes it goes through Splitting: It splits each line in the input file into words. Mapping: It forms a key-value pair. Then divides into the word are the key and 1 is the value assigned to each key. Shuffling: Common key-value pairs get grouped to each other. Reducing: The values of similar keys and values combine together.