Posts

Showing posts from 2014

How to extract and load data to Hadoop/HDFS/Hive from Informatica Powercenter

Image
    Basic Overview of Hadoop :    Hadoop is a framework that allows for the distributed processing of the large data sets across clusters of commodity computers (that do not share any storage or disk).    Hadoop is designed for commodity hardware.    Hadoop uses Google’s MapReduce and Google File System technologies as its foundation.    Two major components of Hadoop systems are :         1- Hadoop Distributed File system (HDFS) ,         2- MapReduce         HDFS is responsible for storing data on cluster of machines.    MapReduce is the data processing component of the Hadoop.            In a hadoop cluster, data is distributed to all the nodes of the cluster as it is being loaded in. the hadoop distributed file system (hdfs) will split large data files into chunks which are managed by different nodes in ...