Reputation: 153
The CSV files that I'm dealing with look like this:
{http://www.omg.org/XMI}id,begin,end,Emotion
17266,772,781,anticipation
17402,772,781,disgust
17304,1345,1370,disgust
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17472,1578,1602,anger
17525,1611,1617,fear
I am trying to create a dictionary of lists, having the 'Emotion'
entries as keys, and the 'begin'
(second column) as the values of the keys that occur in their row.
Desired output would look like this:
{'anger': [1578,
2853,
3951,...],
'anticipation': [772, 4154, 4400...],
...}
So far I've managed to output the desired output, but each value is a list of its own inside of the list of each key.
My current code:
import pickle
from pprint import pprint
import tkinter
from tkinter import filedialog
import csv
from itertools import groupby
root_tk = tkinter.Tk()
root_tk.wm_withdraw()
def extract_gold_emotions():
"""Returns mapping of GOLD emotions to their indices"""
filename = filedialog.askopenfilename()
l = list(csv.reader(open(filename)))
f = lambda x: x[-1]
gold_emo_offsets = {k:list(sorted(map(int, x[1:2])) for x in v)\
for k,v in groupby(sorted(l[1:], key=f), f)}
pickle.dump(gold_emo_offsets, open("empos.p", "wb"))
return gold_emo_offsets
my_emotions = extract_gold_emotions()
Current output:
{'anger': [[1578], [2853], [3951], [4084], [4693], [6420], [8050]],
'anticipation': [[772], [4154], [4400], [7392]],....]]}
Any hints on what to change in the code to output my desired dictionary of lists?
Thanks in advance!
EDIT:
The dictionary values should be outputted as integers.
Upvotes: 1
Views: 53
Reputation: 51683
Using just basic python, no imports (*):
Write file:
with open("data.csv","w") as w:
w.write("""{http://www.omg.org/XMI}id,begin,end,Emotion
17266,772,781,anticipation
17402,772,781,disgust
17304,1345,1370,disgust
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17472,1578,1602,anger
17525,1611,1617,fear
""")
Read and process file:
d = {}
with open("data.csv","r") as r:
next(r) # skip header
for line in r:
if line.strip(): # ignore empty lines (f.e. the last one)
l = line.strip().split(",")
begin = l[1] # the begin column
emo = l[-1] # the emotion column
k = d.setdefault(emo,[]) # get/create key + empty list if needed
k.append(begin) # append to key as string
# k.append(int(begin)) # append to key but convert to int first
print(d)
Output (appended as string):
{'anger': ['1578'],
'surprise': ['1534', '1534', '1534', '1534'],
'fear': ['1611'],
'anticipation': ['772'],
'disgust': ['772', '1345']}
(*): You should not parse csv yourself if it containts things like escaped text or "inline/escaped" seperator-characters. You data is plain though, and you could parse it yourself.
Upvotes: 1
Reputation: 46901
you could use collections.defaultdict
to get your result dictionary:
from io import StringIO
import csv
from collections import defaultdict
text = '''id,begin,end,Emotion
17266,772,781,anticipation
17402,772,781,disgust
17304,1345,1370,disgust
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17472,1578,1602,anger
17525,1611,1617,fear'''
data = defaultdict(list)
with StringIO(text) as file:
for row in csv.DictReader(file):
data[row['Emotion']].append(row['begin'])
print(data)
Upvotes: 1
Reputation: 82785
Using collections.defaultdict
and csv.DictReader
Ex:
import csv
import collections
d = collections.defaultdict(list)
with open(filename) as infile:
reader = csv.DictReader(infile)
for row in reader:
d[row["Emotion"]].append(row["begin"])
print(d)
Output:
defaultdict(<type 'list'>, {'anger': ['1578'], 'surprise': ['1534', '1534', '1534', '1534'], 'fear': ['1611'], 'anticipation': ['772'], 'disgust': ['772', '1345']})
Upvotes: 1