Reputation: 547
I have a text file with the following format:
attr1 1,3,7,6,8,12,24,56
attr2 1,2,3
attr4 56,45,48,23,24,25,29,90,56,57,58,59
attr5 1,2,3,45,6,7,8,9,34,33
and i want to create a dict where the numbers will be the keys and every key if contains the attr must be included in a list. To be more specific for the example that i wrote, the dict must be:
1: [attr1,attr2,attr5]
2: [attr2,attr5]
3: [attr1,attr2,attr5]
6: [attr1, attr5]
etc...
I tried to implement that and i wrote the following code but it doesnt work. Here is my code:
file2 = open("attrs.txt","r")
lines2 = file2.readlines()
d = dict()
list1 = []
for x in lines2:
x = x.strip()
x = x.split('\t')
y = x[0]
list1.append(x[1].split(','))
for i in list1:
d[i] = y
Upvotes: 2
Views: 473
Reputation: 42143
You could also do it like this:
#
# 1) parse the input into an array of dictionaries
# having a one attribute list for each key
#
attribKeys = [ dict.fromkeys(ak[1].split(","),[ak[0]]) for ak in [ line.split("\t") for line in line2.split("\n")] ]
#
# 2) merge the dictionaries concatenating lists for each key
#
import functools as fn
d = fn.reduce(lambda d,a: dict(list(d.items()) + [(k,d.get(k,[])+v) for k,v in a.items()]),attribKeys)
# d will contain:
#
# {'1': ['attr1', 'attr2', 'attr5'],
# '2': ['attr2', 'attr5'],
# '3': ['attr1', 'attr2', 'attr5'],
# '6': ['attr1', 'attr5'],
# '7': ['attr1', 'attr5'],
# '8': ['attr1', 'attr5'],
# '9': ['attr5'],
# '12': ['attr1'],
# '23': ['attr4'],
# '24': ['attr1', 'attr4'],
# '25': ['attr4'],
# '29': ['attr4'],
# '33': ['attr5'],
# '34': ['attr5'],
# '45': ['attr4', 'attr5'],
# '48': ['attr4'],
# '56': ['attr1', 'attr4'],
# '57': ['attr4'],
# '58': ['attr4'],
# '59': ['attr4'],
# '90': ['attr4']}
Upvotes: 0
Reputation: 5821
Use defaultdict
and set
:
#!/usr/bin/env python3
import io
import collections
from pprint import pprint
fdata = """attr1 1,3,7,6,8,12,24,56
attr2 1,2,3
attr4 56,45,48,23,24,25,29,90,56,57,58,59
attr5 1,2,3,45,6,7,8,9,34,33
"""
# with open('attrs.txt') as f:
with io.StringIO(fdata) as f:
d = collections.defaultdict(set)
for line in f:
name, keys = line.strip().split()
for k in keys.split(','):
d[int(k)].add(name)
pprint(d)
Upvotes: 0
Reputation: 719
You can do it with native Python functions.
I used sets for each dictionary value, and convert them into sorted lists finally, in case your data contain duplicated entries.
keys = set()
d = dict()
f = open("attrs.txt","r")
for line in f:
attr,newkeys = line.strip().split()
newkeys = [int(x) for x in newkeys.split(',')]
for key in newkeys:
if key not in keys:
d[key] = set()
keys.add(key)
d[key].add(attr)
for key in list(d):
d[key]=sorted(list(d[key]))
f.close()
It depends on your purpose and how complex is the actual data format. For input file with more columns or there are no other objectives to accomplish along file reading, Pandas (like what @jpp did) will be the better way to go.
Upvotes: 0
Reputation: 164673
If you are happy using a 3rd party library, pandas
provides one way:
import pandas as pd
from io import StringIO
mystr = StringIO("""
attr1 1,3,7,6,8,12,24,56
attr2 1,2,3
attr4 56,45,48,23,24,25,29,90,56,57,58,59
attr5 1,2,3,45,6,7,8,9,34,33""")
# replace mystr with 'file.csv'
df = pd.read_csv(mystr, delim_whitespace=True, header=None, names=['attrs', 'lists'])
# convert identifier column to int
df['attrs'] = df['attrs'].str[4:].map(int)
# split and convert attrs to int
df['lists'] = [list(map(int, x.split(','))) for x in df['lists']]
d = df.set_index('attrs')['lists'].to_dict()
# {1: [1, 3, 7, 6, 8, 12, 24, 56],
# 2: [1, 2, 3],
# 4: [56, 45, 48, 23, 24, 25, 29, 90, 56, 57, 58, 59],
# 5: [1, 2, 3, 45, 6, 7, 8, 9, 34, 33]}
Upvotes: 0
Reputation: 71451
You can use collections.defaultdict
:
import collections
import re
file_data = [[a, list(map(int, b.split(',')))] for a, b in [re.split('\s+', i.strip('\n')) for i in open('filename.txt')]]
d = collections.defaultdict(list)
for a, b in file_data:
for i in b:
d[i].append(a)
print(dict(d))
Output:
{1: ['attr1', 'attr2', 'attr5'], 2: ['attr2', 'attr5'], 3: ['attr1', 'attr2', 'attr5'], 6: ['attr1', 'attr5'], 7: ['attr1', 'attr5'], 8: ['attr1', 'attr5'], 9: ['attr5'], 12: ['attr1'], 23: ['attr4'], 24: ['attr1', 'attr4'], 25: ['attr4'], 90: ['attr4'], 29: ['attr4'], 33: ['attr5'], 34: ['attr5'], 45: ['attr4', 'attr5'], 48: ['attr4'], 56: ['attr1', 'attr4', 'attr4'], 57: ['attr4'], 58: ['attr4'], 59: ['attr4']}
Or a shorter, although more complex solution using itertools.groupby
:
import itertools
new_data = list(itertools.chain(*[[[i, a] for i in b] for a, b in file_data]))
final_result = {a:[b for _, b in c] for a, c in itertools.groupby(sorted(new_data, key=lambda x:x[0]), key=lambda x:x[0])}
Output:
{1: ['attr1', 'attr2', 'attr5'], 2: ['attr2', 'attr5'], 3: ['attr1', 'attr2', 'attr5'], 33: ['attr5'], 6: ['attr1', 'attr5'], 7: ['attr1', 'attr5'], 8: ['attr1', 'attr5'], 9: ['attr5'], 12: ['attr1'], 34: ['attr5'], 45: ['attr4', 'attr5'], 48: ['attr4'], 56: ['attr1', 'attr4', 'attr4'], 90: ['attr4'], 57: ['attr4'], 23: ['attr4'], 24: ['attr1', 'attr4'], 25: ['attr4'], 58: ['attr4'], 59: ['attr4'], 29: ['attr4']}
Upvotes: 2