Reputation: 1
I have a Python program prog1 (mapper) printing out below in three fields. It ends with
print user, text, rt
First field is username. Second is tweet text. Third is # of retweets. Trying to figure out TopN retweets
Below is an example
inocybetech RT @ONAPproject: #ONAPAmsterdam is here! This first code release delivers a unified architecture for end-to-end, closed-loop… 5
jchawki RT @ONAPproject: #ONAPAmsterdam is here! This first code release delivers a unified architecture for end-to-end, closed-loop… 6
jchawki RT @opnfv: Congrats to @ONAPproject on Amsterdam, on its 1st platform release! Learn more about its unified architecture for e… 2
jchawki RT @jzemlin: Now Available! #ONAP Amsterdam brings globally shared implementation for network automation, based on OSS & open st… 3
jchawki RT @bdwick: Now Available! #ONAP Amsterdam brings globally shared implementation for network automation, based on OSS & open st… 1
I am piping this into another Python program prog2 (reducer) via stdin. My problem is figuring out how to read this into a dictionary with two Keys (user and text) and the value (retweet)
If I say
for line in sys.stdin
line is not capturing the entire string. What I need to do is capture the three fields in a dictionary with two keys and one value.
Can you suggest something? I am just starting to learn Python
Thanks
Upvotes: 0
Views: 63
Reputation: 13087
It's probably better to use a format which is easier for parsing. If you print everything just space-delimited, it might get quite complicated to separate individual fields afterwards since the text of the tweet contains spaces (and perhaps even newlines).
One option would be to generate/parse CSV (this has the additional advantage that you can use your output easily with other software supporting CSV input).
So the writer (csvw.py
) could in loose terms look like:
import csv
import sys
writer = csv.writer(sys.stdout, delimiter = ' ')
writer.writerow(['Name', 'Content\nof the message', 12])
and the reader (csvr.py
):
import csv
import sys
reader = csv.reader(sys.stdin, delimiter = ' ')
stat = {}
for record in reader:
name, message, cnt = record
key = (name, message)
stat[key] = int(cnt)
print(stat)
then if you do:
python csvw.py | python csvr.py
you get:
{('Name', 'Content\nof the message'): 12}
Upvotes: 1