How to extract and load data to Hadoop/HDFS/Hive from Informatica Powercenter
Basic Overview of Hadoop : Hadoop is a framework that allows for the distributed processing of the large data sets across clusters of commodity computers (that do not share any storage or disk). Hadoop is designed for commodity hardware. Hadoop uses Google’s MapReduce and Google File System technologies as its foundation. Two major components of Hadoop systems are : 1- Hadoop Distributed File system (HDFS) , 2- MapReduce HDFS is responsible for storing data on cluster of machines. MapReduce is the data processing component of the Hadoop. In a hadoop cluster, data is distributed to all the nodes of the cluster as it is being loaded in. the hadoop distributed file system (hdfs) will split large data files into chunks which are managed by different nodes in ...