user2783058
user2783058

Reputation: 43

modifying files on hdfs using mapreduce

Can I modify files which are residing on hdfs? Is the only way to create a temporary file with modified content and drop the original file?

Can I modify a file using map-reduce? Can different blocks of file be modified in parallel and somehow be combined to a single file?

Upvotes: 0

Views: 469

Answers (1)

Stephen ODonnell
Stephen ODonnell

Reputation: 4466

You cannot modify a file once it is in HDFS, except by appending to it. See this answer that confirms that append is possible:

Append data to existing file in HDFS Java

Map reduce allows you to operate on a file in parallel, with each mapper reading a block of the file, and many mappers running at once. This is how it is designed to work.

Any given mapper could filter rows and write out all, some or none of them to a new file pretty easily.

If you use map-reduce to write out the modified file, by default it will appear as a directory of files which can be combined into a single file depending on your requirement.

Upvotes: 1

Related Questions