nobo
nobo

Reputation: 125

appending regex matches to a dictionary

I have a file in which there is the following info:

dogs_3351.txt:34.13559322033898
cats_1875.txt:23.25581395348837
cats_2231.txt:22.087912087912088
elephants_3535.txt:37.092592592592595
fish_1407.txt:24.132530120481928
fish_2078.txt:23.470588235294116
fish_2041.txt:23.564705882352943
fish_666.txt:23.17241379310345
fish_840.txt:21.77173913043478

I'm looking for a way to match the colon and append whatever appears afterwards (the numbers) to a dictionary the keys of which are the name of the animals in the beginning of each line.

Upvotes: 1

Views: 231

Answers (4)

Facundo Casco
Facundo Casco

Reputation: 10585

Without regex and using defaultdict:

from collections import defaultdict

data = """dogs_3351.txt:34.13559322033898
cats_1875.txt:23.25581395348837
cats_2231.txt:22.087912087912088
elephants_3535.txt:37.092592592592595
fish_1407.txt:24.132530120481928
fish_2078.txt:23.470588235294116
fish_2041.txt:23.564705882352943
fish_666.txt:23.17241379310345
fish_840.txt:21.77173913043478"""

dictionary = defaultdict(list)
for l in data.splitlines():
    animal = l.split('_')[0]
    number = l.split(':')[-1]
    dictionary[animal] = dictionary[animal] + [number]

Just make sure your data is well formatted

Upvotes: 1

Joel Cornett
Joel Cornett

Reputation: 24788

Actually, regular expressions are unnecessary, provided that your data is well formatted and contains no surprises.

Assuming that data is a variable containing the string that you listed above:

dict(item.split(":") for item in data.split())

Upvotes: 4

cobie
cobie

Reputation: 7271

why dont you use the python find method to locate the index of the colons which you can use to slice the string.

>>> x='dogs_3351.txt:34.13559322033898'
>>> key_index = x.find(':')
>>> key = x[:key_index]
>>> key
'dogs_3351.txt'
>>> value = x[key_index+1:]
>>> value
'34.13559322033898'
>>> 

Read in each line of the file as a text and process the lines individually as above.

Upvotes: 1

georg
georg

Reputation: 214949

t = """
dogs_3351.txt:34.13559322033898
cats_1875.txt:23.25581395348837
cats_2231.txt:22.087912087912088
elephants_3535.txt:37.092592592592595
fish_1407.txt:24.132530120481928
fish_2078.txt:23.470588235294116
fish_2041.txt:23.564705882352943
fish_666.txt:23.17241379310345
fish_840.txt:21.77173913043478
"""

import re

d = {}
for p, q in re.findall(r'^(.+?)_.+?:(.+)', t, re.M):
    d.setdefault(p, []).append(q)

print d

Upvotes: 1

Related Questions