Swapnil
Swapnil

Reputation: 169

MapReduce using hadoop streaming via python - Pass a list from mapper to reducer and Read it as a list

I want to pass list as the value from mapper into the reducer stage.Currently, the reducer reads the list as a string. Is there a way I can make sure that python can interpret it as a list.

Upvotes: 0

Views: 671

Answers (2)

Swapnil
Swapnil

Reputation: 169

Use ast.literal_eval('str_val') in reducer and will convert the string to list. For more information you can refer to https://docs.python.org/2/library/ast.html

Upvotes: 1

carpenter
carpenter

Reputation: 1210

Hadoop streaming uses stdin and stdout for its communication; therefore, everything coming into each subsequent job will be a string. You can use some kind of delimiter in your representation such as a comma:

the, items, in, my, list

and then split them in your reducer:

for line in sys.stdin:
    data = line.split(',')

and if you want it to be a dictionary:

import ast

for line in sys.stdin:
    dict = ast.literal_eval("{'waffle': 'delicious', 'pancake': 'mediocre'}")

There is no way to already know that it is a list though because you are reading the standard input stream.

Upvotes: 1

Related Questions