Chad D
Chad D

Reputation: 559

look up dictionary in python

So I have file that with multiple line that look like this (space delimiter file):

A1BG      P04217     VAR_018369  p.His52Arg     Polymorphism  rs893184    -
A1BG      P04217     VAR_018370  p.His395Arg    Polymorphism  rs2241788   -
AAAS      Q9NRG9     VAR_012804  p.Gln15Lys     Disease       -           Achalasia

How do I make dictionary to look for id in second column and store the number (between words) on fourth column.

I tried this but it give me index of out range

lookup = defaultdict(list)
with open ('humsavar.txt', 'r') as humsavarTxt:
    for line in csv.reader(humsavarTxt):
        code = re.match('[a-z](\d+)[a-z]', line[1], re.I)
        if code: 
            lookup[line[-2]].append(code.group(1))

print lookup['P04217']

Upvotes: 1

Views: 451

Answers (3)

the wolf
the wolf

Reputation: 35522

If you want a pure dictionary, this works:

d={}
with open(your_file,'rb') as f:
    for line in f:
        l=line.split()
        num=int(re.search(r'(\d+)',l[3]).group(1))
        d.setdefault(l[1],[]).append(num)

Prints:

{'P04217': [52, 395], 'Q9NRG9': [15]}

For a non regex solution, you can also do this:

d={}
with open(your_file,'rb') as f:
    for line in f:
        els=line.split()
        num=int(''.join(c for c in els[3] if c.isdigit()))
        d.setdefault(els[1],[]).append(num)

Upvotes: 0

DSM
DSM

Reputation: 353019

Here's a variant of the original code:

import csv, re
from collections import defaultdict

lookup = defaultdict(list)
with open('humsavar.txt', 'rb') as humsavarTxt:
    reader = csv.reader(humsavarTxt, delimiter=" ", skipinitialspace=True)
    for line in reader:
        code = re.search(r'(\d+)', line[3])
        lookup[line[1]].append(int(code.group(1)))

which produces

>>> lookup
defaultdict(<type 'list'>, {'P04217': [52, 395], 'Q9NRG9': [15]})
>>> lookup['P04217']
[52, 395]

Upvotes: 3

Niek de Klein
Niek de Klein

Reputation: 8824

If the id and the number is always in the second and fourth column, and it's always space delimited you don't need to use regular expresion. You can split on the spaces instead:

lookup = defaultdict(list)
with open ('humsavar.txt', 'r') as humsavarTxt:
    for line in humsavarTxt:
         lookup[line.split(' ')[1]].append(line.split(' ')[3])

Upvotes: 1

Related Questions