bmeredith
bmeredith

Reputation: 59

Dictionary as key value?

I have been searching for my answer, probably just not using the right verbiage, and only come up with using lists as dictionary key values.

I need to take 20 csv files and anonomyze identifying student, teacher, school and district information for research purposes on testing data. The csv files range anywhere from 20K to 50K rows and 11 to 20 columns, not all have identical information.

One file may have:

studid, termdates, testname, score, standarderr

And another may have:

termdates, studid, studfirstname, studlastname, studdob, ethnicity, grade

And yet another may have:

termdates, studid, teacher, classname, schoolname, districtname

I am putting the varying data into dictionaries for each type of file/dataset, maybe this isn't the best, but I am getting stuck when trying to use a dictionary as a key value for when a student may have taken multiple tests i.e. Language, Reading, Math etc.

For instance:

studDict{studid{'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'},
        studid1{'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'}}

Any guidance on which libraries or a brief direction to a method would be greatly appreciated. I understand enough Python that I do not need a full hand holding, but helping me get across the street would be great. :D

CLARIFICATION

I have a better chance of winning the lottery than this project does of being used more than once, so the simpler the method the better. If it would be a repeating project I would most likely dump the data into db tables and work from there.

Upvotes: 0

Views: 157

Answers (5)

Narcisse Doudieu Siewe
Narcisse Doudieu Siewe

Reputation: 649

Be more explicit please. Your solution depend on the design.

in district you have schools and in each school you have teachers or student.

first you order your datas by district and school

    districts = { 
                 "name_district1":{...}, 
                 "name_district2":{...},
                 ...,
                 "name_districtn":{...},
                }

for each distric:

    # "name_districtn"
      {
        "name_school1": {...},
        "name_school2": {...},
        ...,
        "name_schooln": {...},
      }

for each school: #"name_schooln"

{
  id_student1: {...},
  id_student2: {...},
  ...,
  id_studentn: {...}  
}

and for each student...you define his elements

you can also define one dictionary for all the student but you have to design a uniq id for each student in this case for example:

   uniq_Id = "".join(("name_district","name_school", str(student_id)))
   Total = {
             uniq_Id: {'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'}} ,
           ...,
           }

Upvotes: 0

Navith
Navith

Reputation: 1069

If you can know the order of a file ahead of time, it's not hard to make a dictionary for it with help from csv.

File tests.csv:

12345,2015-05-19,AP_Bio,96,0.12
67890,2015-04-28,AP_Calc,92,0.17

In a Python file in the same directory as tests.csv:

import csv

with open("tests.csv") as tests:
    # Change the fields for files that follow a different form
    fields = ["studid", "termdates", "testname", "score", "standarderr"]
    students_data = list(csv.DictReader(tests, fieldnames=fields))

# Just a pretty show
print(*students_data, sep="\n")
# {'studid': '12345', 'testname': 'AP_Bio', 'standarderr': '0.12', 'termdates': '2015-05-19', 'score': '96'}
# {'studid': '67890', 'testname': 'AP_Calc', 'standarderr': '0.17', 'termdates': '2015-04-28', 'score': '92'}

Upvotes: 0

jorgeh
jorgeh

Reputation: 1767

If I interpret you correctly, in the end you want a dict with students (i.e. studid) as key and different student related data as value? This is probably not exactly what you want, but I think it will point you in the right direction (adapted from this answer):

import csv
from collections import namedtuple, defaultdict

D = defaultdict(list)
for filename in files:
    with open(filename, mode="r") as infile:
        reader = csv.reader(infile)
        Data = namedtuple("Data", next(reader))
        for row in reader:
            data = Data(*row)
            D[data.studid].append(data)

In the end that should give you a dict D with stuids as keys and a list of test results as values. Each test result is a namedtuple. This assumes that every file has a studid column!.

Upvotes: 0

Eric Renouf
Eric Renouf

Reputation: 14510

You cannot use a dictionary as a key to a dictionary. Keys must be hashable (i.e., immutable), and dictionaries are not, therefore cannot be used as keys.

You can store a dictionary in another dictionary just the same as any other value. You can, for example do

studDict = { studid: {'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'},
    studid1: {'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'}}

assuming you have defined studid and studid1 elsewhere.

Upvotes: 1

manglano
manglano

Reputation: 844

A dictionary cannot be a key, but a dictionary can be a value for some key in another dictionary (a dict-of-dicts). However, instantiating dictionaries of varying length for every tuple is probably going to make your data analysis very difficult.

Consider using Pandas to read the tuples into a DataFrame with null values where appropriate.

dict API: https://docs.python.org/2/library/stdtypes.html#mapping-types-dict

Pandas Data handling package: http://pandas.pydata.org/

Upvotes: 1

Related Questions