Lauren
Lauren

Reputation: 55

I'm having an issue parsing a text file into a dictionary

I have a text file, and I want to load it into a dictionary in Python.

text looks like this, tab delimiated:

Form Dosage ReferenceDrug drugname activeingred INJECTABLE; INJECTION 20,000 UNITS/ML LIQUAEMIN SODIUM HEPARIN SODIUM INJECTABLE; INJECTION 40,000 UNITS/ML LIQUAEMIN SODIUM HEPARIN SODIUM INJECTABLE; INJECTION 5,000 UNITS/ML LIQUAEMIN SODIUM HEPARIN SODIUM

And right now my code looks like this, but it does not work (list index out of range, and nothing pushed to the dictionary). I dont know where I'm going wrong, not a programmer. Thanks for any help.

import sys

def load_medications(filename):
    meds_dict = {}
    f = open(filename)
    l = " "
    # print f.read()
    for line in f:
        fields = l.split("\t")
        ApplNo = fields[0]
        ProductNo = fields[1]
        Form = fields[2]
        Dosage = fields[3]
        ProductMktStatus = fields[4]
        TECode = fields[5]
        ReferenceDrug = fields[6]
            DrugName = fields[7]
        ActiveIngred = fields[8]

        meds = {
                "ApplNo": ApplNo,   
                "ProductNo": ProductNo, 
                "Form": Form,
                "Dosage": Dosage,   
                "ProductMktStatus": ProductMktStatus,
                "TECode": TECode,
                "ReferenceDrug": ReferenceDrug, 
                "DrugName": DrugName,
                "ActiveIngred": ActiveIngred
            }       
        meds_dict[DrugName] = meds
    f.close()
    return meds_dict


def main():
    x = load_medications("druglist.txt")
    print x



if __name__ == "__main__":
    main()

Upvotes: 1

Views: 144

Answers (7)

PaulMcG
PaulMcG

Reputation: 63709

Since your field names are all valid Python identifiers, why not read your data into namedtuples instead of dicts?

data = """Form Dosage ReferenceDrug drugname activeingred INJECTABLE; INJECTION 20,000 UNITS/ML LIQUAEMIN   SODIUM HEPARIN  SODIUM  INJECTABLE; INJECTION   40,000 UNITS/ML LIQUAEMIN   SODIUM HEPARIN  SODIUM  INJECTABLE; INJECTION   5,000 UNITS/ML  LIQUAEMIN   SODIUM HEPARIN  SODIUM  INJECTABLE""".split('; ')

from collections import namedtuple

# define class DrugData as a namedtuple, using the headers from data[0]
DrugData = namedtuple("DrugData", data[0])

# use a list comprehension to create a DrugData for each data line
druglist = [DrugData(*line.split('\t')) for line in data[1:]]

# access each tuple in druglist, using attribute access to individual fields
for d in druglist:
    print "%s | %s | %s" % (d.ReferenceDrug, d.Form, d.Dosage)

Prints:

LIQUAEMIN | INJECTION | 20,000 UNITS/ML
LIQUAEMIN | INJECTION | 40,000 UNITS/ML
LIQUAEMIN | INJECTION | 5,000 UNITS/ML

EDIT:

Looking back at your original question, it looks like you want to create a single dict of all of these entries, keyed by drugname. Unfortunately, dict keys have to be unique, and in your example, all 3 entries have the same drugname. You may have to combine 2 or more fields to compose a truly unique key for a dict that handles all of these values, such as a tuple of (drugname, Dosage).

OR, change your design slightly so that each drugname points to a list of matching values. The simplest would be to use a defaultdict instead of a dict, so that new entries are automatically initialized with an empty list. In your code, you would add an import statement:

from collections import defaultdict

and change the declaration of meds_dict to:

meds_dict = defaultdict(list)

meaning that any new keys that haven't been seen yet will be initialized using the function/class provided as the argument to defaultdict, in this case list.

Then to add new entries to meds_dict, instead of assigning with '=', you would append to the list of all matching meds/dosages:

meds_dict[DrugName].append(meds)

Now for any DrugName, you'll get the list of matching Form/Dosage/etc. records.

Upvotes: 0

user1588900
user1588900

Reputation: 1

It looks like your code is assuming that there are 9 properties of a particular drug. The sample text file you posted, however only has 5 properties. When you call fields = l.split("\t"), an array of only 5 elements will be returned, because there are only 5 elements in the "druglist.txt". So if you index into fields with a value greater than or equal to 5, i.e. fields[8], you will get an "index out of range" exception.

Upvotes: 0

girasquid
girasquid

Reputation: 15516

You might have an easier time parsing this data with the CSV module in the standard library - if you rig it up with tabs as your separator and ; as your lineterminator, it should be have no problems parsing the file you posted.

Using a DictReader would also make it a little easier to read over your rows (you could refer to things as line['ApplNo'] instead of line[0]).

Unfortunately, it doesn't look like the headers in your file map to what you want to call them in your code - so you would need to assign the names of the fields yourself based on what was in the dictionary.

Upvotes: 2

Andrew Clark
Andrew Clark

Reputation: 208425

You should look into csv.DictReader for this, assuming your file has a proper heading line at the beginning you should be able to create your dictionaries as simply as something like this:

def load_medications(filename):
    reader = csv.DictReader(open(filename), delimiter='\t')
    meds = {}
    for row in reader:
        meds[row['DrugName']] = row
    return meds

If your file does not have a heading line, you can pass in the field names to the DictReader initializer:

fields = ["ApplNo", "ProductNo", "Form", "Dosage", "ProductMktStatus"
          "TECode", "ReferenceDrug", "DrugName", "ActiveIngred"]
reader = csv.DictReader(open(filename), delimiter='\t', fieldnames=fields)

Upvotes: 1

phihag
phihag

Reputation: 287775

You actually split l, and not line. You want:

def load_medications(filename):
    meds_dict = {}
    with open(filename) as f: # Ensure that the file gets closed
        for line in f:
            fields = line.split("\t") # line, not l
            keys = ["ApplNo", "ProductNo", "Form", "Dosage", "ProductMktStatus",
                    "TECode", "ReferenceDrug", "DrugName", "ActiveIngred",]

            if len(fields) != len(keys):
                raise ValueError("Malformed input line " + repr(line))

            meds = dict(zip(keys, fields))
            meds_dict[meds["DrugName"]] = meds
    return meds_dict

For details on why this works, read up on zip and dict.

Upvotes: 0

Jill-Jênn Vie
Jill-Jênn Vie

Reputation: 1841

I think you've overestimated the number of columns of your file. Where are ApplNo, ProductNo?

Upvotes: 0

Daniel Nouri
Daniel Nouri

Reputation: 1274

Try line.split instead of l.split?

Upvotes: 2

Related Questions