Reputation: 59
I have been searching for my answer, probably just not using the right verbiage, and only come up with using lists as dictionary key values.
I need to take 20 csv files and anonomyze identifying student, teacher, school and district information for research purposes on testing data. The csv files range anywhere from 20K to 50K rows and 11 to 20 columns, not all have identical information.
One file may have:
studid, termdates, testname, score, standarderr
And another may have:
termdates, studid, studfirstname, studlastname, studdob, ethnicity, grade
And yet another may have:
termdates, studid, teacher, classname, schoolname, districtname
I am putting the varying data into dictionaries for each type of file/dataset, maybe this isn't the best, but I am getting stuck when trying to use a dictionary as a key value for when a student may have taken multiple tests i.e. Language, Reading, Math etc.
For instance:
studDict{studid{'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'},
studid1{'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'}}
Any guidance on which libraries or a brief direction to a method would be greatly appreciated. I understand enough Python that I do not need a full hand holding, but helping me get across the street would be great. :D
CLARIFICATION
I have a better chance of winning the lottery than this project does of being used more than once, so the simpler the method the better. If it would be a repeating project I would most likely dump the data into db tables and work from there.
Upvotes: 0
Views: 157
Reputation: 649
Be more explicit please. Your solution depend on the design.
in district you have schools and in each school you have teachers or student.
first you order your datas by district and school
districts = {
"name_district1":{...},
"name_district2":{...},
...,
"name_districtn":{...},
}
for each distric:
# "name_districtn"
{
"name_school1": {...},
"name_school2": {...},
...,
"name_schooln": {...},
}
for each school: #"name_schooln"
{
id_student1: {...},
id_student2: {...},
...,
id_studentn: {...}
}
and for each student...you define his elements
you can also define one dictionary for all the student but you have to design a uniq id for each student in this case for example:
uniq_Id = "".join(("name_district","name_school", str(student_id)))
Total = {
uniq_Id: {'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'}} ,
...,
}
Upvotes: 0
Reputation: 1069
If you can know the order of a file ahead of time, it's not hard to make a dictionary for it with help from csv
.
File tests.csv
:
12345,2015-05-19,AP_Bio,96,0.12
67890,2015-04-28,AP_Calc,92,0.17
In a Python file in the same directory as tests.csv
:
import csv
with open("tests.csv") as tests:
# Change the fields for files that follow a different form
fields = ["studid", "termdates", "testname", "score", "standarderr"]
students_data = list(csv.DictReader(tests, fieldnames=fields))
# Just a pretty show
print(*students_data, sep="\n")
# {'studid': '12345', 'testname': 'AP_Bio', 'standarderr': '0.12', 'termdates': '2015-05-19', 'score': '96'}
# {'studid': '67890', 'testname': 'AP_Calc', 'standarderr': '0.17', 'termdates': '2015-04-28', 'score': '92'}
Upvotes: 0
Reputation: 1767
If I interpret you correctly, in the end you want a dict
with students (i.e. studid
) as key and different student related data as value? This is probably not exactly what you want, but I think it will point you in the right direction (adapted from this answer):
import csv
from collections import namedtuple, defaultdict
D = defaultdict(list)
for filename in files:
with open(filename, mode="r") as infile:
reader = csv.reader(infile)
Data = namedtuple("Data", next(reader))
for row in reader:
data = Data(*row)
D[data.studid].append(data)
In the end that should give you a dict D
with stuid
s as keys and a list of test results as values. Each test result is a namedtuple
. This assumes that every file has a studid
column!.
Upvotes: 0
Reputation: 14510
You cannot use a dictionary as a key to a dictionary. Keys must be hashable (i.e., immutable), and dictionaries are not, therefore cannot be used as keys.
You can store a dictionary in another dictionary just the same as any other value. You can, for example do
studDict = { studid: {'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'},
studid1: {'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'}}
assuming you have defined studid
and studid1
elsewhere.
Upvotes: 1
Reputation: 844
A dictionary cannot be a key, but a dictionary can be a value for some key in another dictionary (a dict-of-dicts). However, instantiating dictionaries of varying length for every tuple is probably going to make your data analysis very difficult.
Consider using Pandas
to read the tuples into a DataFrame with null
values where appropriate.
dict
API: https://docs.python.org/2/library/stdtypes.html#mapping-types-dict
Pandas
Data handling package: http://pandas.pydata.org/
Upvotes: 1