Reputation: 93
I have a data file and qc file loaded in HDFS and I want to compare the count present in QC file and the data file line count. For this, I wrote a shell script which extracts the count part of the QC file and does wc -l
for Data file.
For QA file:
qccount=$(webhdfs -cat hdfs://${CLUSTER_NAME}$hdfs_src_path/$directory/$qc_file_name | cut -d "|" -f2)
echo "QC file count: $qccount";
This prints out the count as 256341
For data file:
file_count=$(webhdfs -cat hdfs://${CLUSTER_NAME}$hdfs_src_path/$directory/$data_file_name | wc -l | cut -d " " -f1)
echo "File count: $file_count";
This prints out 0
Here wc -l for the file in hdfs is not working may I know the reason?
Upvotes: 1
Views: 315
Reputation: 3421
You should use something like this:
file_count=$(hdfs dfs -cat hdfs://${CLUSTER_NAME}$hdfs_src_path/$directory/$data_file_name | wc -l)
echo "File count: $file_count";
Note: I am not sure why you have used webhdfs -cat
instead of hdfs dfs -cat
Upvotes: 3