Reputation: 123
I have to create a big dictionary for my measurement data. My (simplified) code looks like this so far:
i = 0
for i in range(len(station_data_files_pandas)): # range(0, 299)
station_data_f_pandas = station_data_files_pandas[i]
station_id = str(int(station_data_f_pandas["STATIONS_ID"][0]))
Y_RR = station_data_f_pandas["MO_RR"].resample("A").apply(very_sum)
# creating the dictionary layer for the anual data in this dictionary
anual_data = {
"Y_RR" : Y_RR
}
# creating the dictionary layer for the montly data in this dictionary
montly_data = {
"MO_RR"
}
# creating the dictionary layer for every station. Everystation has montly and anual data
station = {
"montly_data" : montly_data,
"anual_data" : anual_data
}
# creating the dictionary layer where the staiondata can get called by station id
station_data_dic = {
station_id : station
}
# creating the final layer of the dictionary
station_data_dictionary = {
"station_data": station_data_dic
}
This is the output:
station_data_dictionary
Out[387]:
{'station_data': {'4706': {'montly_data': {'MO_RR'}, # "4706" is the id from the last element in station_data_files_pandas
'anual_data': {'Y_RR': YearMonth
# YearMonth is the index...
# I actually wanted the Index just to show yyyy-mm ...
1981-12-31 1164.3
1982-12-31 852.4
1983-12-31 826.5
1984-12-31 798.8
1985-12-31 NaN
1986-12-31 NaN
1987-12-31 NaN
1988-12-31 NaN
1989-12-31 NaN
1990-12-31 1101.1
1991-12-31 892.4
1992-12-31 802.1
1993-12-31 873.5
1994-12-31 842.7
1995-12-31 962.0
1996-12-31 NaN
1997-12-31 927.9
1998-12-31 NaN
1999-12-31 NaN
2000-12-31 997.8
2001-12-31 986.3
2002-12-31 1117.6
2003-12-31 690.8
2004-12-31 NaN
2005-12-31 NaN
2006-12-31 NaN
2007-12-31 NaN
2008-12-31 NaN
2009-12-31 NaN
2010-12-31 NaN
Freq: A-DEC, Name: MO_RR, dtype: float64}}}}
As you see my output consist just of one "sheet". Expected would be 300 sheets.
I assume my code overwrites the data as it loops through, so that at the end my output is just one sheet made from the last element in station_data_files_pandas. How can I fix this? Is my approach maybe entirely wrong?...
When it is ready, it has to look like:
station_data_dictionary["station_data"]["403"]["anual_data"]["Y_RR"]
station_data_dictionary["station_data"]["573"]["anual_data"]["Y_RR"]
station_data_dictionary["station_data"]["96"]["anual_data"]["Y_RR"]
...and so on.
As you see, the only thing, that I am allowed to change, as I call different things in my dictionary, is my station_id.
Note: There is one question with the exact same title, but it was not helpful to me at all...
Upvotes: 0
Views: 80
Reputation: 1474
Try this below. Also, if you need your dictionary to remain ordered in the same way you added them, you'll have to use an OrderedDict from collections package.
Therefore, when you print the dictionary or loop through its data, you'll loop in the same order you've added them in the code below.
Obs: I'm assuming station_data_files_pandas is a list, not a dictionary, that's why I've altered the for loop "signature" to use the enhanced for. If I'm wrong and this variable is in fact a dictionary, and each integer of the for loop is a key of this dictionary, you could also loop through the items like this:
for k, v in station_data_files_pandas.items():
# now k carries the integer you were using before.
# and v carries station_data_f_pandas
import collections
station_data_dictionary=collections.OrderedDict()
#for i in range(len(station_data_files_pandas)): # range(0, 299)
# using the enhanced for loop
for station_data_f_pandas in station_data_files_pandas: # range(0, 299)
# This is not needed anymore
# station_data_f_pandas = station_data_files_pandas[i]
# station_id = str(int(station_data_f_pandas["STATIONS_ID"][0]))
# You could directly convert to string
station_id = str(int(station_data_f_pandas["STATIONS_ID"][0]))
Y_RR = station_data_f_pandas["MO_RR"].resample("A").apply(very_sum)
MO_RR = # something goes here
# creating the dictionary layer for the anual data in this dictionary
anual_data = {
"Y_RR" : Y_RR
}
# creating the dictionary layer for the montly data in this dictionary
montly_data = {
# "MO_RR"
# You can't have just a key to your dictionary, you need to assign a value to it.
"MO_RR": MO_RR
}
# creating the dictionary layer for every station. Everystation has montly and anual data
station = {
"montly_data" : montly_data,
"anual_data" : anual_data
}
# creating the dictionary layer where the staiondata can get called by station id
station_data_dic = {
station_id : station
}
# creating the final layer of the dictionary
#station_data_dictionary = {
# "station_data": station_data_dic
# }
# Why use {"apparently_useless_id_layer": {"actual_id_info": "data"}}
# instead of {"actual_info_id": "data"} ?
station_data_dictionary[station_id] = station
Upvotes: 1
Reputation: 4576
I haven't tested this as I don't have your data, but this should produce your required dictionary. The only changes are at the top and bottom:
station_data_dictionary = {
"station_data": {}
}
for i in range(len(station_data_files_pandas)): # range(0, 299)
station_data_f_pandas = station_data_files_pandas[i]
station_id = str(int(station_data_f_pandas["STATIONS_ID"][0]))
Y_RR = station_data_f_pandas["MO_RR"].resample("A").apply(very_sum)
# creating the dictionary layer for the anual data in this dictionary
anual_data = {
"Y_RR" : Y_RR
}
# creating the dictionary layer for the montly data in this dictionary
montly_data = {
"MO_RR"
}
# creating the dictionary layer for every station. Everystation has montly and anual data
station = {
"montly_data" : montly_data,
"anual_data" : anual_data
}
station_data_dictionary["station_data"][station_id] = station
Note that you don't need statements like i = 0
before a for
loop as the loop initialises the variable for you.
Also the "station_data"
layer of the dictionary seems superfluous as it is the only key at that layer, but you had it in your required output so I left it in.
Upvotes: 1