Reputation: 135
I split the dialogue into two dictionaries, each of them contains words which the person say (i have 2 persons). I have to print 4 columns (keyword, number from first directory (how many times use that word first person), number from second directory and count of them) and order by keyword. Can somebody help me ? Output have to look like this:
african 1 0 1
air-speed 1 0 0
an 1 1 2
arthur 1 0 1
...
As you can see I have som text
text = """Bridgekeeper: Hee hee heh. Stop. What... is your name?
King Arthur: It is 'Arthur', King of the Britons.
Bridgekeeper: What... is your quest?
King Arthur: To seek the Holy Grail.
Bridgekeeper: What... is the air-speed velocity of an unladen swallow?
King Arthur: What do you mean? An African or European swallow?"""
Output of bridgekeeper_w and arthur_w:
print (bridgekeeper_w)
{'hee': 2, 'heh': 1, 'stop': 1, 'what': 3, 'is': 3, 'your': 2, 'name': 1, 'quest': 1, 'the': 1, 'air-speed': 1, 'velocity': 1, 'of': 1, 'an': 1, 'unladen': 1, 'swallow': 1}
print (arthur_w)
{'king': 4, 'it': 1, 'is': 1, 'arthur': 1, 'of': 1, 'the': 2, 'britons': 1, 'to': 1, 'seek': 1, 'holy': 1, 'grail': 1, 'what': 1, 'do': 1, 'you': 1, 'mean': 1, 'an': 1, 'african': 1, 'or': 1, 'european': 1, 'swallow': 1}
Now i need this (keyword, number from first dict, number from second dict, and count):
african 1 0 1
air-speed 1 0 0
an 1 1 2
arthur 1 0 1
...
``
Upvotes: 1
Views: 632
Reputation: 189387
If you already have two dictionaries, the main problem is how to loop over keys which are in either dictionary. But that's not hard;
for key in sorted(set(list(bridgekeeper_w.keys()) + list(arthur_w.keys()))):
b_count = 0 if key not in bridgekeeper_w else bridgekeeper_w[key]
a_count = 0 if key not in arthur_w else arthur_w[key]
print('%-20s %3i %3i %3i' % (key, b_count, a_count, b_count+a_count))
If the integrity of the dictionaries is not important, a more elegant solution might be to add the missing keys to one of the dictionaries, and then simply loop over all its keys.
for key in arthur_w.keys():
if key not in bridgekeeper_w:
bridgekeeper_w[key] = 0
for key, b_count in sorted(bridgekeeper_w.items()):
a_count = 0 if key not in arthur_w else arthur_w[key]
print('%-20s %3i %3i %3i' % (key, b_count, a_count, b_count+a_count))
This does away with the rather tedious and slightly complex set(list(keys()...))
of the first solution, at the cost of traversing one of the dictionaries twice.
Upvotes: 2
Reputation: 5992
Or a solution without third-party libraries:
bridgekeeper_d = {'hee': 2, 'heh': 1, 'stop': 1, 'what': 3, 'is': 3, 'your': 2, 'name': 1, 'quest': 1, 'the': 1, 'air-speed': 1, 'velocity': 1, 'of': 1, 'an': 1, 'unladen': 1, 'swallow': 1}
arthur_d = {'king': 4, 'it': 1, 'is': 1, 'arthur': 1, 'of': 1, 'the': 2, 'britons': 1, 'to': 1, 'seek': 1, 'holy': 1, 'grail': 1, 'what': 1, 'do': 1, 'you': 1, 'mean': 1, 'an': 1, 'african': 1, 'or': 1, 'european': 1, 'swallow': 1}
joined = dict.fromkeys(list(bridgekeeper_d.keys()) + list(arthur_d.keys()), {})
for key, value in bridgekeeper_d.items():
joined[key]["bridgekeeper"] = value
for key, value in arthur_d.items():
joined[key]["arthur"] = value
# At this point, joined looks like this:
# {
# 'hee': {'bridgekeeper': 1, 'arthur': 1},
# 'heh': {'bridgekeeper': 1, 'arthur': 1},
# 'stop': {'bridgekeeper': 1, 'arthur': 1},
# 'what': {'bridgekeeper': 1, 'arthur': 1}
# ...
# }
for key, dic in joined.items():
print("%-15s %d %d %d" % (key, dic["bridgekeeper"], dic["arthur"], dic["bridgekeeper"] + dic["arthur"]))
Output:
hee 1 1 2
heh 1 1 2
stop 1 1 2
what 1 1 2
is 1 1 2
your 1 1 2
name 1 1 2
quest 1 1 2
the 1 1 2
air-speed 1 1 2
velocity 1 1 2
of 1 1 2
an 1 1 2
unladen 1 1 2
swallow 1 1 2
king 1 1 2
it 1 1 2
arthur 1 1 2
britons 1 1 2
to 1 1 2
seek 1 1 2
holy 1 1 2
grail 1 1 2
do 1 1 2
you 1 1 2
mean 1 1 2
african 1 1 2
or 1 1 2
european 1 1 2
Upvotes: 0
Reputation: 14949
There are few steps to achieve the below dataframe-
Finally, we'll have a JSON like this -
{'Bridgekeeper': Counter({'Hee': 1,
'hee': 1,
'heh': 1,
'Stop': 1,
'What': 3,
'is': 3,
'your': 2,
'name': 1,
'quest': 1,
'the': 1,
'airspeed': 1,
'velocity': 1,
'of': 1,
'an': 1,
'unladen': 1,
'swallow': 1}),
This JSON can be transformed into the required output very easily if we load it into a dataframe.
from collections import defaultdict
import string
from collections import Counter
import pandas as pd
result = defaultdict(list)
for row in text.split('\n'):
result[row.split(':')[0].strip()].append(row.split(':')[1].strip())
result = {key:(' '.join(value)).translate(str.maketrans('', '', string.punctuation)) for key,value in result.items()}
result = {key:Counter(value.split(' ')) for key,value in result.items()}
df = pd.DataFrame(result).fillna(0).astype(int)
df['sum'] = df['Bridgekeeper'] + df['King Arthur']
df.to_csv('out.csv', sep='\t')
Output Dataframe -
Bridgekeeper King Arthur sum
Hee 1 0 1
hee 1 0 1
heh 1 0 1
Stop 1 0 1
What 3 1 4
is 3 1 4
your 2 0 2
name 1 0 1
quest 1 0 1
the 1 2 3
airspeed 1 0 1
velocity 1 0 1
of 1 1 2
an 1 0 1
unladen 1 0 1
swallow 1 1 2
It 0 1 1
Arthur 0 1 1
King 0 1 1
Britons 0 1 1
To 0 1 1
seek 0 1 1
Holy 0 1 1
Grail 0 1 1
do 0 1 1
you 0 1 1
mean 0 1 1
An 0 1 1
Upvotes: 0