Reputation: 7994
I have questions concerning cdh and how it is maintained:
when I go to the packaging info related to a specific cdh version, I can check the package version of each component (for instance for cdh 5.5.5 : https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_55.html#cdh_555 ). However I don't understand what does the "package version" refers to exactly. For instance, for the component Apache Parquet, the "package version" is parquet-1.5.0+cdh5.5.5+181 . How can I find out exactly what source code is packaged ? Does this correspond to a label on a specific repo? If I go to the "official" apache parquet repo, there is no "cdh5.5.5" branch, the closest thing I have is a tag called "1.5.0" ( https://github.com/apache/parquet-mr/tree/parquet-1.5.0 ) . How do the people from cdh know what parquet-1.5.0+cdh5.5.5+181 exactly refers to ?
Still concerning Apache Parquet, how come even the most recent cdh versions are still using the Apache Parquet on tag is 22 May 2014, ie more than 3 years ago. Why don't they upgrade to a newer version, like 1.6.0 ? The reason I'm asking is that there is a bug in 1.5.0 that was fixed more than 3 years ago in parquet 1.6.0, yet the latest cdh version is still using the 1.5.0 version. Is there a reason why they keep using a really old, bugged, version?
thanks !
Upvotes: 0
Views: 46
Reputation: 5957
You are correct in assuming parquet-1.5.0+cdh5.5.5+181 is closest to parquet 1.5.0. However the code will not be identical to parquet 1.5.0 upstream because:
CDH enforces cross component compatibility. Code and applications using parquet-1.5.0 must also work with all the other Hadoop services (HDFS, Hive, Oozie, YARN, Spark, Solr, HBase). Incompatibilities would have to be fixed so parquet's code would include those bug fixes.
CDH enforces major version compatibility. This means an application written on CDH5.1 should still work on CDH5.5 and CDH5.7, all CDH5.x versions. This also would alter the codebase.
The best way to interpret this is to say that parquet-1.5.0+cdh5.5.5+181 will support all features provided in parquet 1.5.0 and will also work with the corresponding Hadoop services packaged with CDH5.5.
Version compatibility is also the reason why CDH Hadoop service versions run older versions of the related upstream projects. It's much harder to maintain backwards compatibility especially if APIs change between versions.
Upvotes: 0