Reputation: 158
This is a continuation of my previous question:
How to print only if a character is an alphabet?
I now have a mapper that is working perfectly, and it's giving me this output when I use a text file with the string `It's a beautiful life".
i 1 1 0
t 1 0 0
s 1 0 0
a 1 1 1
b 1 0 0
e 1 0 0
a 1 0 0
u 1 0 0
t 1 0 0
i 1 0 0
f 1 0 0
u 1 0 0
l 1 0 0
l 1 0 0
i 1 0 0
f 1 0 0
e 1 0 1
Now I am trying to send this output into a script to get an output like this:
a [(1, 0, 0), (1, 1, 1)]
b [(1, 0, 0)]
e [(1, 0, 0), (1, 0, 1)]
f [(1, 0, 0), (1, 0, 0)]
i [(1, 0, 0), (1, 0, 0), (1, 1, 0)]
l [(1, 0, 0), (1, 0, 0)]
s [(1, 0, 0)]
t [(1, 0, 0), (1, 0, 0)]
u [(1, 0, 0), (1, 0, 0)]
so that each tuple is added each time the letter from the output of mapper is matched.
I have some code that was from a different but similar problem that I am trying to change around so it works with my mapper:
from itertools import groupby
from operator import itemgetter
import sys
def read_mapper_output(file):
for line in file:
yield line.strip().split(' ')
#Call the function to read the input which is (<WORD>, 1)
data = read_mapper_output(sys.stdin)
#Each word becomes key and is used to group the rest of the values by it.
#The first argument is the data to be grouped
#The second argument is what it should be grouped by. In this case it is the <WORD>
for key, keygroup in groupby(data, itemgetter(0)):
values = ' '.join(sorted(v for k, v in keygroup))
print("%s %s" % (key, values))
I am having trouble changing the last block of code to work with my mapper. I know that I will have to print out a list of tuples for every instance of a letter occurring in the mapper.
Upvotes: 0
Views: 104
Reputation: 158
I was able to answer my own question doing this:
from itertools import groupby
from operator import itemgetter
import sys
def read_mapper_output(file):
for line in file:
yield line.strip().split(' ')
#Call the function to read the input which is (<WORD>, 1)
data = read_mapper_output(sys.stdin)
#Each word becomes key and is used to group the rest of the values by it.
#The first argument is the data to be grouped
#The second argument is what it should be grouped by. In this case it is the <WORD>
for key, keygroup in groupby(data, itemgetter(0)): # key = alphabetical letters, keygroup = groupby objects, need to be unpacked?
values = []
values.append(sorted((v,x,y) for k, v, x, y in keygroup))
my_list = next(iter(values))
print("%s %s" % (key, my_list))
I only had to change the last block, and I am sure this is spaghetti code that could be optimized, but I'm not very good at Python.
Upvotes: 0