Brebenel
Brebenel

Reputation: 345

Python-Dictionary from file with two keys and multiple values

(This question has been answered in a couple of previous posts on stackoverflow. However, I cannot get the right result and I cannot figure what am I doing wrong?)

I would like to create a dictionary from a text file that contains two keys and 14 values:

data.txt:

Key1    Key2    Val1    Val2    Val3…Val14
100       a     x0      y0      z0………n0
101       a     x1      y1      z1………n1
102       b     x2      y2      z2………n2
103       b     x3      y3      z3………n3
104       c     x4      y4      z4………n4
105       c     x5      y5      z5………n5
…
140       m     xm      ym      zm………nm

The dictionary should look like this:

{100: {a: [x0, y0, z0,…n0]},
101: {a: [x1, y1, z1,…n1]},
102: {b: [x2, y2, z2,…n2]},
103: {b: [x3, y3, z3,…n3]},
 …
140: {m: [xm, ym, zm,…nm]}}

I have tried Code1 and Code2. Code1 gives a very large dictionary where lines are repeated with other lines appended to them. Code2 gives the error TypeError: unhashable type: 'slice’.

Code1:
lookupfile = open("data.txt", 'r')
lines = lookupfile.readlines()
lookup = lines[1:]   # Start the dictionary from row 1, exclude the column names
d={}
for line in lookup:
    dic = line.split()
    d.update({dic[0]: {dic[1]: dic[2:]}})
    print(d) 

Code2:
data = defaultdict(dict)
with open('data.txt', 'r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        data[row['Key1']][row['Key2']]=row['Val1':]
        print (data)

I would prefer the code to look like Code2, so I can later use the column names. But, I would appreciate any help.

I can provide additional information, if needed.

Upvotes: 1

Views: 1521

Answers (2)

Alex Martelli
Alex Martelli

Reputation: 881595

You're using a DictReader, so each row is a dict, and you can't slice a dict (as you're trying to do on the RHS of the assignment).

So use a plain csv.reader (so each row is a list, which you can slice) and:

data[row[0]][row[1]]=row[2:]

Upvotes: 1

Padraic Cunningham
Padraic Cunningham

Reputation: 180401

s="""Key1    Key2    Val1    Val2    Val3…Val14
100       a     x0      y0      z0
101       a     x1      y1      z1
102       b     x2      y2      z2
103       b     x3      y3      z3
104       c     x4      y4      z4
105       c     x5      y5      z5"""
d  = {}
for line in s.splitlines()[1:]:
    spl = line.split()
    d[spl[0]] ={spl[1]:spl[2:]}

from pprint import pprint
pprint(d)
{'100': {'a': ['x0', 'y0', 'z0']},
 '101': {'a': ['x1', 'y1', 'z1']},
 '102': {'b': ['x2', 'y2', 'z2']},
 '103': {'b': ['x3', 'y3', 'z3']},
 '104': {'c': ['x4', 'y4', 'z4']},
 '105': {'c': ['x5', 'y5', 'z5']}}

The same logic applies in your file code, to skip the first line call next on the file object. Then simply index each row as above.

d = {}
with open('data.txt', 'r') as f:
    next(f) # skip header
    for row in f:
        spl = line.split()
        # slicing using spl[2:] will give you a list of all remaining values
        d[spl[0]] = {spl[1]:spl[2:]}

If you actually have multiple spaces between your columns using str.split will work better than using the csv module.

Upvotes: 4

Related Questions