The Apache Software Foundation has announced a fresh release of its platform – Apache Hadoop 3.3.0. A year and a half have passed since the last update. The platform itself is a tool for organizing distributed processing of large amounts of data using MapReduce. Hadoop includes a set of utilities, libraries, and frameworks for developing and executing distributed programs that can run on clusters of thousands of nodes.
A specialized Hadoop Distributed File System (HDFS) has been created for Hadoop, which provides data redundancy and optimization of MapReduce applications. HDFS is designed for storing large files distributed between individual nodes of a compute cluster. Due to its capabilities, Hadoop is used by the largest companies and organizations. Google even granted Hadoop the right to use technologies that affect patents related to the MapReduce method.
Here is a list of the most important changes in the new version:
- ARM-based platform support
- The version of the Protobuf (Protocol buffers) format has been updated to 3.7.1. Protobuf is used to serialize structured data.
- Added Delegation Token (authentication) for S3A connector, improved support for caching 404 responses, plus improved S3guard performance and overall reliability.
- Added DNS Resolution service, which allows clients to identify servers via DNS by hostnames. Accordingly, there is no need to add all hosts in the settings.
- The YARN (Yet Another Resource Negotiator) application directory is now searchable.
- Added support for scheduling OPPORTUNISTIC containers to run via Resource Manager.
- Added support for Java 11.
- Added support for the Tencent Cloud COS file system, which is required to access COS object storage.
Because Hadoop is actively developing, the market for solutions based on it is snowballing. If, in 2019, the market volume was about $ 1.7 billion, then, according to experts, by 2024, it will reach $ 9.4 billion.