Reputation: 1352
I'd uploaded 50GB data on Hadoop cluster. But Now i want to delete first row of data file. This is time consuming if i remove that data & change manually. Then upload it again on HDFS. Please reply me.
Upvotes: 2
Views: 3599
Reputation: 294247
HDFS files are immutable (for all practical purposes).
You need to upload the modified file(s). You can do the change programatically with a M/R job that does a near-identity transformation, eg. running a streaming shell script that does sed
, but the gist of it that you need to create new files, HDFS files cannot be edited.
Upvotes: 3