Lee Yaan
Lee Yaan

Reputation: 547

Save text elements to a dictionary in python

I have a text file with the following format:

attr1    1,3,7,6,8,12,24,56
attr2    1,2,3
attr4    56,45,48,23,24,25,29,90,56,57,58,59
attr5    1,2,3,45,6,7,8,9,34,33

and i want to create a dict where the numbers will be the keys and every key if contains the attr must be included in a list. To be more specific for the example that i wrote, the dict must be:

    1: [attr1,attr2,attr5]
    2: [attr2,attr5]
    3: [attr1,attr2,attr5]
    6: [attr1, attr5] 
etc...

I tried to implement that and i wrote the following code but it doesnt work. Here is my code:

file2 = open("attrs.txt","r")
lines2 = file2.readlines()
d = dict()
list1 = []
for x in lines2:
    x = x.strip()
    x = x.split('\t')
    y = x[0]
    list1.append(x[1].split(','))
    for i in list1:
        d[i] = y

Upvotes: 2

Views: 473

Answers (5)

Alain T.
Alain T.

Reputation: 42143

You could also do it like this:

#
# 1) parse the input into an array of dictionaries 
#    having a one attribute list for each key
# 
attribKeys = [ dict.fromkeys(ak[1].split(","),[ak[0]]) for ak in [ line.split("\t") for line in line2.split("\n")] ]

#
# 2) merge the dictionaries concatenating lists for each key
#
import functools as fn
d = fn.reduce(lambda d,a: dict(list(d.items()) + [(k,d.get(k,[])+v) for k,v in a.items()]),attribKeys)

# d will contain: 
#
# {'1': ['attr1', 'attr2', 'attr5'], 
#  '2': ['attr2', 'attr5'], 
#  '3': ['attr1', 'attr2', 'attr5'],
#  '6': ['attr1', 'attr5'], 
#  '7': ['attr1', 'attr5'], 
#  '8': ['attr1', 'attr5'], 
#  '9': ['attr5'],
#  '12': ['attr1'], 
#  '23': ['attr4'], 
#  '24': ['attr1', 'attr4'],
#  '25': ['attr4'], 
#  '29': ['attr4'],
#  '33': ['attr5'], 
#  '34': ['attr5'], 
#  '45': ['attr4', 'attr5'], 
#  '48': ['attr4'], 
#  '56': ['attr1', 'attr4'], 
#  '57': ['attr4'], 
#  '58': ['attr4'], 
#  '59': ['attr4'], 
#  '90': ['attr4']}

Upvotes: 0

Harvey
Harvey

Reputation: 5821

Use defaultdict and set:

#!/usr/bin/env python3

import io
import collections
from pprint import pprint


fdata = """attr1    1,3,7,6,8,12,24,56
attr2   1,2,3
attr4   56,45,48,23,24,25,29,90,56,57,58,59
attr5   1,2,3,45,6,7,8,9,34,33
"""

# with open('attrs.txt') as f:
with io.StringIO(fdata) as f:
    d = collections.defaultdict(set)
    for line in f:
        name, keys = line.strip().split()
        for k in keys.split(','):
            d[int(k)].add(name)

pprint(d)

Upvotes: 0

Claire
Claire

Reputation: 719

You can do it with native Python functions.

I used sets for each dictionary value, and convert them into sorted lists finally, in case your data contain duplicated entries.

keys = set()
d = dict()
f = open("attrs.txt","r")
for line in f:
    attr,newkeys = line.strip().split()
    newkeys = [int(x) for x in newkeys.split(',')]
    for key in newkeys:
        if key not in keys:
            d[key] = set()
            keys.add(key)
        d[key].add(attr)
for key in list(d):
    d[key]=sorted(list(d[key]))
f.close()

It depends on your purpose and how complex is the actual data format. For input file with more columns or there are no other objectives to accomplish along file reading, Pandas (like what @jpp did) will be the better way to go.

Upvotes: 0

jpp
jpp

Reputation: 164673

If you are happy using a 3rd party library, pandas provides one way:

import pandas as pd
from io import StringIO

mystr = StringIO("""
attr1    1,3,7,6,8,12,24,56
attr2    1,2,3
attr4    56,45,48,23,24,25,29,90,56,57,58,59
attr5    1,2,3,45,6,7,8,9,34,33""")

# replace mystr with 'file.csv'
df = pd.read_csv(mystr, delim_whitespace=True, header=None, names=['attrs', 'lists'])

# convert identifier column to int
df['attrs'] = df['attrs'].str[4:].map(int)

# split and convert attrs to int
df['lists'] = [list(map(int, x.split(','))) for x in df['lists']]

d = df.set_index('attrs')['lists'].to_dict()

# {1: [1, 3, 7, 6, 8, 12, 24, 56],
#  2: [1, 2, 3],
#  4: [56, 45, 48, 23, 24, 25, 29, 90, 56, 57, 58, 59],
#  5: [1, 2, 3, 45, 6, 7, 8, 9, 34, 33]}

Upvotes: 0

Ajax1234
Ajax1234

Reputation: 71451

You can use collections.defaultdict:

import collections
import re
file_data = [[a, list(map(int, b.split(',')))] for a, b in [re.split('\s+', i.strip('\n')) for i in open('filename.txt')]]
d = collections.defaultdict(list)
for a, b in file_data:
  for i in b:
    d[i].append(a)

print(dict(d))

Output:

{1: ['attr1', 'attr2', 'attr5'], 2: ['attr2', 'attr5'], 3: ['attr1', 'attr2', 'attr5'], 6: ['attr1', 'attr5'], 7: ['attr1', 'attr5'], 8: ['attr1', 'attr5'], 9: ['attr5'], 12: ['attr1'], 23: ['attr4'], 24: ['attr1', 'attr4'], 25: ['attr4'], 90: ['attr4'], 29: ['attr4'], 33: ['attr5'], 34: ['attr5'], 45: ['attr4', 'attr5'], 48: ['attr4'], 56: ['attr1', 'attr4', 'attr4'], 57: ['attr4'], 58: ['attr4'], 59: ['attr4']}

Or a shorter, although more complex solution using itertools.groupby:

import itertools
new_data = list(itertools.chain(*[[[i, a] for i in b] for a, b in file_data]))
final_result = {a:[b for _, b in c] for a, c in itertools.groupby(sorted(new_data, key=lambda x:x[0]), key=lambda x:x[0])}

Output:

{1: ['attr1', 'attr2', 'attr5'], 2: ['attr2', 'attr5'], 3: ['attr1', 'attr2', 'attr5'], 33: ['attr5'], 6: ['attr1', 'attr5'], 7: ['attr1', 'attr5'], 8: ['attr1', 'attr5'], 9: ['attr5'], 12: ['attr1'], 34: ['attr5'], 45: ['attr4', 'attr5'], 48: ['attr4'], 56: ['attr1', 'attr4', 'attr4'], 90: ['attr4'], 57: ['attr4'], 23: ['attr4'], 24: ['attr1', 'attr4'], 25: ['attr4'], 58: ['attr4'], 59: ['attr4'], 29: ['attr4']}

Upvotes: 2

Related Questions