HDFS :Hadoop Distributed File System
It is a special designed file system for storing huge data with Cluster of Commodity Hardware with Streaming Access Pattern.
Now lets understand the terminology Cluster and Streaming Access Pattern.
- Cluster – is a special type of computational cluster designed specifically for storing and analyzing huge amounts of unstructured data in a distributed computing environment.
- Streaming Access Pattern – means write once and read any number of times but don’t change the content of the file
As HDFS works on the principle of ‘Write Once, Read Many‘, the feature of streaming access is extremely important in HDFS. HDFS focuses not so much on storing the data but how to retrieve it at the fastest possible speed, especially while analyzing logs. In HDFS, reading the complete data is more important than the time taken to fetch a single record from the data.
Now let see commodity hardware, Does commodity hardware include RAM?
- Commodity hardware is a non-expensive system which is not of high quality or high-availability. Hadoop can be installed in any average commodity hardware. We don’t need super computers or high-end hardware to work on Hadoop. Yes, Commodity hardware includes RAM because there will be some services which will be running on RAM.