Matthias
Matthias

Reputation: 5764

Show how a parquet file is replicated and stored on HDFS

Data stored in parquet format results in a folder with many small files on HDFS.

Is there a way to view how those files are replicated in HDFS (on which nodes)?

Thanks in advance.

Upvotes: 0

Views: 359

Answers (1)

eliasah
eliasah

Reputation: 40380

If I understand your question correctly, you actually want to track which data blocks is on which data node and that's not apache-spark specific.

You can use hadoop fsck command as followed :

hadoop fsck <path> -files -blocks -locations    

This will print out locations for every block in the specified path.

Upvotes: 2

Related Questions