Andy Reddy
Andy Reddy

Reputation: 93

Find line count of a file in HDFS to compare with the count present in QC file

I have a data file and qc file loaded in HDFS and I want to compare the count present in QC file and the data file line count. For this, I wrote a shell script which extracts the count part of the QC file and does wc -l for Data file.

For QA file:

qccount=$(webhdfs -cat hdfs://${CLUSTER_NAME}$hdfs_src_path/$directory/$qc_file_name | cut -d "|" -f2)

echo "QC file count: $qccount";

This prints out the count as 256341

For data file:

file_count=$(webhdfs -cat hdfs://${CLUSTER_NAME}$hdfs_src_path/$directory/$data_file_name | wc -l | cut -d " " -f1)

echo "File count: $file_count";

This prints out 0

Here wc -l for the file in hdfs is not working may I know the reason?

Upvotes: 1

Views: 315

Answers (1)

PradeepKumbhar
PradeepKumbhar

Reputation: 3421

You should use something like this:

file_count=$(hdfs dfs -cat hdfs://${CLUSTER_NAME}$hdfs_src_path/$directory/$data_file_name | wc -l)

echo "File count: $file_count";

Note: I am not sure why you have used webhdfs -cat instead of hdfs dfs -cat

Upvotes: 3

Related Questions