Waldkamel
Waldkamel

Reputation: 153

Dictionary of lists from two specific columns of a csv file

The CSV files that I'm dealing with look like this:

{http://www.omg.org/XMI}id,begin,end,Emotion
17266,772,781,anticipation
17402,772,781,disgust
17304,1345,1370,disgust
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17472,1578,1602,anger
17525,1611,1617,fear

I am trying to create a dictionary of lists, having the 'Emotion' entries as keys, and the 'begin' (second column) as the values of the keys that occur in their row.

Desired output would look like this:

{'anger': [1578,
           2853,
           3951,...],
 'anticipation': [772, 4154, 4400...],
...}

So far I've managed to output the desired output, but each value is a list of its own inside of the list of each key.

My current code:

import pickle
from pprint import pprint
import tkinter
from tkinter import filedialog
import csv
from itertools import groupby


root_tk = tkinter.Tk()
root_tk.wm_withdraw()

def extract_gold_emotions():


    """Returns mapping of GOLD emotions to their indices"""



    filename = filedialog.askopenfilename()


    l = list(csv.reader(open(filename)))


    f = lambda x: x[-1]


    gold_emo_offsets = {k:list(sorted(map(int, x[1:2])) for x in v)\
                           for k,v in groupby(sorted(l[1:], key=f), f)}


    pickle.dump(gold_emo_offsets, open("empos.p", "wb"))


    return gold_emo_offsets


my_emotions = extract_gold_emotions()

Current output:

{'anger': [[1578], [2853], [3951], [4084], [4693], [6420], [8050]],
 'anticipation': [[772], [4154], [4400], [7392]],....]]}

Any hints on what to change in the code to output my desired dictionary of lists?

Thanks in advance!

EDIT:

The dictionary values should be outputted as integers.

Upvotes: 1

Views: 53

Answers (3)

Patrick Artner
Patrick Artner

Reputation: 51683

Using just basic python, no imports (*):

Write file:

with open("data.csv","w") as w:
    w.write("""{http://www.omg.org/XMI}id,begin,end,Emotion
17266,772,781,anticipation
17402,772,781,disgust
17304,1345,1370,disgust
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17472,1578,1602,anger
17525,1611,1617,fear
""")

Read and process file:

d = {}
with open("data.csv","r") as r:
    next(r) # skip header
    for line in r:
        if line.strip(): # ignore empty lines (f.e. the last one)
            l = line.strip().split(",")
            begin = l[1] # the begin column
            emo = l[-1]  # the emotion column
            k = d.setdefault(emo,[]) # get/create key + empty list if needed
            k.append(begin)            # append to key as string
            # k.append(int(begin))     # append to key but convert to int first

print(d)            

Output (appended as string):

{'anger': ['1578'], 
 'surprise': ['1534', '1534', '1534', '1534'], 
 'fear': ['1611'], 
 'anticipation': ['772'], 
 'disgust': ['772', '1345']}

(*): You should not parse csv yourself if it containts things like escaped text or "inline/escaped" seperator-characters. You data is plain though, and you could parse it yourself.

Upvotes: 1

hiro protagonist
hiro protagonist

Reputation: 46901

you could use collections.defaultdict to get your result dictionary:

from io import StringIO
import csv
from collections import defaultdict

text = '''id,begin,end,Emotion
17266,772,781,anticipation
17402,772,781,disgust
17304,1345,1370,disgust
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17424,1534,1543,surprise
17472,1578,1602,anger
17525,1611,1617,fear'''

data = defaultdict(list)

with StringIO(text) as file:
    for row in csv.DictReader(file):
        data[row['Emotion']].append(row['begin'])

print(data)

Upvotes: 1

Rakesh
Rakesh

Reputation: 82785

Using collections.defaultdict and csv.DictReader

Ex:

import csv
import collections

d = collections.defaultdict(list)

with open(filename) as infile:
    reader = csv.DictReader(infile)        
    for row in reader:
        d[row["Emotion"]].append(row["begin"])

print(d)

Output:

defaultdict(<type 'list'>, {'anger': ['1578'], 'surprise': ['1534', '1534', '1534', '1534'], 'fear': ['1611'], 'anticipation': ['772'], 'disgust': ['772', '1345']})

Upvotes: 1

Related Questions