Reputation: 55
I have a text file, and I want to load it into a dictionary in Python.
text looks like this, tab delimiated:
Form Dosage ReferenceDrug drugname activeingred INJECTABLE; INJECTION 20,000 UNITS/ML LIQUAEMIN SODIUM HEPARIN SODIUM INJECTABLE; INJECTION 40,000 UNITS/ML LIQUAEMIN SODIUM HEPARIN SODIUM INJECTABLE; INJECTION 5,000 UNITS/ML LIQUAEMIN SODIUM HEPARIN SODIUM
And right now my code looks like this, but it does not work (list index out of range, and nothing pushed to the dictionary). I dont know where I'm going wrong, not a programmer. Thanks for any help.
import sys
def load_medications(filename):
meds_dict = {}
f = open(filename)
l = " "
# print f.read()
for line in f:
fields = l.split("\t")
ApplNo = fields[0]
ProductNo = fields[1]
Form = fields[2]
Dosage = fields[3]
ProductMktStatus = fields[4]
TECode = fields[5]
ReferenceDrug = fields[6]
DrugName = fields[7]
ActiveIngred = fields[8]
meds = {
"ApplNo": ApplNo,
"ProductNo": ProductNo,
"Form": Form,
"Dosage": Dosage,
"ProductMktStatus": ProductMktStatus,
"TECode": TECode,
"ReferenceDrug": ReferenceDrug,
"DrugName": DrugName,
"ActiveIngred": ActiveIngred
}
meds_dict[DrugName] = meds
f.close()
return meds_dict
def main():
x = load_medications("druglist.txt")
print x
if __name__ == "__main__":
main()
Upvotes: 1
Views: 144
Reputation: 63709
Since your field names are all valid Python identifiers, why not read your data into namedtuples instead of dicts?
data = """Form Dosage ReferenceDrug drugname activeingred INJECTABLE; INJECTION 20,000 UNITS/ML LIQUAEMIN SODIUM HEPARIN SODIUM INJECTABLE; INJECTION 40,000 UNITS/ML LIQUAEMIN SODIUM HEPARIN SODIUM INJECTABLE; INJECTION 5,000 UNITS/ML LIQUAEMIN SODIUM HEPARIN SODIUM INJECTABLE""".split('; ')
from collections import namedtuple
# define class DrugData as a namedtuple, using the headers from data[0]
DrugData = namedtuple("DrugData", data[0])
# use a list comprehension to create a DrugData for each data line
druglist = [DrugData(*line.split('\t')) for line in data[1:]]
# access each tuple in druglist, using attribute access to individual fields
for d in druglist:
print "%s | %s | %s" % (d.ReferenceDrug, d.Form, d.Dosage)
Prints:
LIQUAEMIN | INJECTION | 20,000 UNITS/ML
LIQUAEMIN | INJECTION | 40,000 UNITS/ML
LIQUAEMIN | INJECTION | 5,000 UNITS/ML
EDIT:
Looking back at your original question, it looks like you want to create a single dict of all of these entries, keyed by drugname
. Unfortunately, dict keys have to be unique, and in your example, all 3 entries have the same drugname
. You may have to combine 2 or more fields to compose a truly unique key for a dict that handles all of these values, such as a tuple of (drugname, Dosage)
.
OR, change your design slightly so that each drugname
points to a list of matching values. The simplest would be to use a defaultdict instead of a dict, so that new entries are automatically initialized with an empty list. In your code, you would add an import statement:
from collections import defaultdict
and change the declaration of meds_dict to:
meds_dict = defaultdict(list)
meaning that any new keys that haven't been seen yet will be initialized using the function/class provided as the argument to defaultdict, in this case list
.
Then to add new entries to meds_dict, instead of assigning with '=', you would append to the list of all matching meds/dosages:
meds_dict[DrugName].append(meds)
Now for any DrugName, you'll get the list of matching Form/Dosage/etc. records.
Upvotes: 0
Reputation: 1
It looks like your code is assuming that there are 9 properties of a particular drug. The sample text file you posted, however only has 5 properties. When you call fields = l.split("\t")
, an array of only 5 elements will be returned, because there are only 5 elements in the "druglist.txt". So if you index into fields
with a value greater than or equal to 5, i.e. fields[8]
, you will get an "index out of range" exception.
Upvotes: 0
Reputation: 15516
You might have an easier time parsing this data with the CSV module in the standard library - if you rig it up with tabs as your separator and ;
as your lineterminator, it should be have no problems parsing the file you posted.
Using a DictReader would also make it a little easier to read over your rows (you could refer to things as line['ApplNo']
instead of line[0]
).
Unfortunately, it doesn't look like the headers in your file map to what you want to call them in your code - so you would need to assign the names of the fields yourself based on what was in the dictionary.
Upvotes: 2
Reputation: 208425
You should look into csv.DictReader
for this, assuming your file has a proper heading line at the beginning you should be able to create your dictionaries as simply as something like this:
def load_medications(filename):
reader = csv.DictReader(open(filename), delimiter='\t')
meds = {}
for row in reader:
meds[row['DrugName']] = row
return meds
If your file does not have a heading line, you can pass in the field names to the DictReader
initializer:
fields = ["ApplNo", "ProductNo", "Form", "Dosage", "ProductMktStatus"
"TECode", "ReferenceDrug", "DrugName", "ActiveIngred"]
reader = csv.DictReader(open(filename), delimiter='\t', fieldnames=fields)
Upvotes: 1
Reputation: 287775
You actually split l
, and not line
. You want:
def load_medications(filename):
meds_dict = {}
with open(filename) as f: # Ensure that the file gets closed
for line in f:
fields = line.split("\t") # line, not l
keys = ["ApplNo", "ProductNo", "Form", "Dosage", "ProductMktStatus",
"TECode", "ReferenceDrug", "DrugName", "ActiveIngred",]
if len(fields) != len(keys):
raise ValueError("Malformed input line " + repr(line))
meds = dict(zip(keys, fields))
meds_dict[meds["DrugName"]] = meds
return meds_dict
For details on why this works, read up on zip
and dict
.
Upvotes: 0
Reputation: 1841
I think you've overestimated the number of columns of your file. Where are ApplNo
, ProductNo
?
Upvotes: 0