Slow MapReduce performance when using Custom Input Format

Question

I am having an issue with MapReduce. I had to read multiple CSV files.

1 CSV file outputs 1 single row.

I cannot split the CSV files in custom input format as the rows in the CSV files are not in the same format. For example:

row 1 contains A, B, C row 2 contains D, E, F

my output value should be like A, B, D, F

I have 1100 CSV files so 1100 splits are created and hence 1100 Mappers are created. The mappers are very simple and they shouldn't take much time to process.

But the 1100 input files take a lot of time to process.

Can anyone please guide me what I can take a look at or if I am doing anything wrong in this approach?

Slow MapReduce performance when using Custom Input Format

Answers (1)

Related Questions