RaviR
RaviR

Reputation: 1

Parsing a string into a dictionary with two keys and a value

I have a Python program prog1 (mapper) printing out below in three fields. It ends with

 print user, text, rt

First field is username. Second is tweet text. Third is # of retweets. Trying to figure out TopN retweets

Below is an example

inocybetech RT @ONAPproject: #ONAPAmsterdam is here! This first code release delivers a unified architecture for end-to-end, closed-loop…  5
jchawki RT @ONAPproject: #ONAPAmsterdam is here! This first code release delivers a unified architecture for end-to-end, closed-loop…  6
jchawki RT @opnfv: Congrats to @ONAPproject on Amsterdam, on its 1st platform release! Learn more about its unified architecture for e…  2
jchawki RT @jzemlin: Now Available! #ONAP Amsterdam brings globally shared implementation for network automation, based on OSS & open st…  3
jchawki RT @bdwick: Now Available! #ONAP Amsterdam brings globally shared implementation for network automation, based on OSS & open st…  1

I am piping this into another Python program prog2 (reducer) via stdin. My problem is figuring out how to read this into a dictionary with two Keys (user and text) and the value (retweet)

If I say

for line in sys.stdin

line is not capturing the entire string. What I need to do is capture the three fields in a dictionary with two keys and one value.

Can you suggest something? I am just starting to learn Python

Thanks

Upvotes: 0

Views: 63

Answers (1)

ewcz
ewcz

Reputation: 13087

It's probably better to use a format which is easier for parsing. If you print everything just space-delimited, it might get quite complicated to separate individual fields afterwards since the text of the tweet contains spaces (and perhaps even newlines).

One option would be to generate/parse CSV (this has the additional advantage that you can use your output easily with other software supporting CSV input).

So the writer (csvw.py) could in loose terms look like:

import csv
import sys

writer = csv.writer(sys.stdout, delimiter = ' ')

writer.writerow(['Name', 'Content\nof the message', 12])

and the reader (csvr.py):

import csv
import sys

reader = csv.reader(sys.stdin, delimiter = ' ')
stat = {}
for record in reader:
    name, message, cnt = record

    key = (name, message)
    stat[key] = int(cnt)

print(stat)

then if you do:

python csvw.py | python csvr.py

you get:

{('Name', 'Content\nof the message'): 12}

Upvotes: 1

Related Questions