Rebecca
Rebecca

Reputation: 123

Creating a dictionary by for loop (?)

I have to create a big dictionary for my measurement data. My (simplified) code looks like this so far:

i = 0  

for i in range(len(station_data_files_pandas)):  # range(0, 299)
    station_data_f_pandas = station_data_files_pandas[i]
    station_id = str(int(station_data_f_pandas["STATIONS_ID"][0]))
    Y_RR = station_data_f_pandas["MO_RR"].resample("A").apply(very_sum)

    # creating the dictionary layer for the anual data in this dictionary
    anual_data = {
            "Y_RR" : Y_RR
            }
    # creating the dictionary layer for the montly data in this dictionary
    montly_data = {
            "MO_RR"    
            }
    # creating the dictionary layer for every station. Everystation has montly and anual data
    station = {
            "montly_data" : montly_data,
            "anual_data" : anual_data
            }
    # creating the dictionary layer where the staiondata can get called by station id
    station_data_dic = {
            station_id : station
            }
    # creating the final layer of the dictionary
    station_data_dictionary = {
            "station_data": station_data_dic
            }    

This is the output:

station_data_dictionary
Out[387]: 
{'station_data': {'4706': {'montly_data': {'MO_RR'},   # "4706" is the id from the last element in station_data_files_pandas
   'anual_data': {'Y_RR': YearMonth
           # YearMonth is the index...
           # I actually wanted the Index just to show yyyy-mm ...
    1981-12-31    1164.3
    1982-12-31     852.4
    1983-12-31     826.5
    1984-12-31     798.8
    1985-12-31       NaN
    1986-12-31       NaN
    1987-12-31       NaN
    1988-12-31       NaN
    1989-12-31       NaN
    1990-12-31    1101.1
    1991-12-31     892.4
    1992-12-31     802.1
    1993-12-31     873.5
    1994-12-31     842.7
    1995-12-31     962.0
    1996-12-31       NaN
    1997-12-31     927.9
    1998-12-31       NaN
    1999-12-31       NaN
    2000-12-31     997.8
    2001-12-31     986.3
    2002-12-31    1117.6
    2003-12-31     690.8
    2004-12-31       NaN
    2005-12-31       NaN
    2006-12-31       NaN
    2007-12-31       NaN
    2008-12-31       NaN
    2009-12-31       NaN
    2010-12-31       NaN
    Freq: A-DEC, Name: MO_RR, dtype: float64}}}}

As you see my output consist just of one "sheet". Expected would be 300 sheets.

I assume my code overwrites the data as it loops through, so that at the end my output is just one sheet made from the last element in station_data_files_pandas. How can I fix this? Is my approach maybe entirely wrong?...

When it is ready, it has to look like:

station_data_dictionary["station_data"]["403"]["anual_data"]["Y_RR"]
station_data_dictionary["station_data"]["573"]["anual_data"]["Y_RR"]
station_data_dictionary["station_data"]["96"]["anual_data"]["Y_RR"]

...and so on.

As you see, the only thing, that I am allowed to change, as I call different things in my dictionary, is my station_id.

Note: There is one question with the exact same title, but it was not helpful to me at all...

Upvotes: 0

Views: 80

Answers (2)

Teodoro
Teodoro

Reputation: 1474

Try this below. Also, if you need your dictionary to remain ordered in the same way you added them, you'll have to use an OrderedDict from collections package.

Therefore, when you print the dictionary or loop through its data, you'll loop in the same order you've added them in the code below.

Obs: I'm assuming station_data_files_pandas is a list, not a dictionary, that's why I've altered the for loop "signature" to use the enhanced for. If I'm wrong and this variable is in fact a dictionary, and each integer of the for loop is a key of this dictionary, you could also loop through the items like this:

for k, v in station_data_files_pandas.items():
    # now k carries the integer you were using before.
    # and v carries station_data_f_pandas

Code correction

import collections

station_data_dictionary=collections.OrderedDict()

#for i in range(len(station_data_files_pandas)):  # range(0, 299)
  # using the enhanced for loop
  for station_data_f_pandas in station_data_files_pandas:  # range(0, 299) 

    # This is not needed anymore    
    # station_data_f_pandas = station_data_files_pandas[i]

    # station_id = str(int(station_data_f_pandas["STATIONS_ID"][0]))
    # You could directly convert to string
    station_id = str(int(station_data_f_pandas["STATIONS_ID"][0]))

    Y_RR = station_data_f_pandas["MO_RR"].resample("A").apply(very_sum)
    MO_RR = # something goes here


    # creating the dictionary layer for the anual data in this dictionary
    anual_data = {
            "Y_RR" : Y_RR
            }

    # creating the dictionary layer for the montly data in this dictionary
    montly_data = {
            # "MO_RR"
            # You can't have just a key to your dictionary, you need to assign a value to it.

            "MO_RR": MO_RR             
            }

    # creating the dictionary layer for every station. Everystation has montly and anual data
    station = {
            "montly_data" : montly_data,
            "anual_data" : anual_data
            }

    # creating the dictionary layer where the staiondata can get called by station id

    station_data_dic = {
            station_id : station
            }


    # creating the final layer of the dictionary
    #station_data_dictionary = {
    #       "station_data": station_data_dic
    #        }

    # Why use {"apparently_useless_id_layer": {"actual_id_info": "data"}}
    # instead of {"actual_info_id": "data"} ?
    station_data_dictionary[station_id] = station

Upvotes: 1

Seb
Seb

Reputation: 4576

I haven't tested this as I don't have your data, but this should produce your required dictionary. The only changes are at the top and bottom:

station_data_dictionary = {
    "station_data": {}
}

for i in range(len(station_data_files_pandas)):  # range(0, 299)

    station_data_f_pandas = station_data_files_pandas[i]

    station_id = str(int(station_data_f_pandas["STATIONS_ID"][0]))

    Y_RR = station_data_f_pandas["MO_RR"].resample("A").apply(very_sum)

    # creating the dictionary layer for the anual data in this dictionary
    anual_data = {
            "Y_RR" : Y_RR
            }

    # creating the dictionary layer for the montly data in this dictionary
    montly_data = {
            "MO_RR"    
            }

    # creating the dictionary layer for every station. Everystation has montly and anual data
    station = {
            "montly_data" : montly_data,
            "anual_data" : anual_data
            }

    station_data_dictionary["station_data"][station_id] = station

Note that you don't need statements like i = 0 before a for loop as the loop initialises the variable for you.

Also the "station_data" layer of the dictionary seems superfluous as it is the only key at that layer, but you had it in your required output so I left it in.

Upvotes: 1

Related Questions