dgketchum
dgketchum

Reputation: 311

How to access pandas multiindex by element within a nested dict?

I have a dict of types of regions, within each a dict of sub-regions, and within each of those a pandas dataframe object, indexed to the period from which I need to compute each parameter (column) time series. Additionally, I need it in two units.

So I created something like this:

regions = ['region_x', 'region_y']
sub_regions = ['a', 'b', 'c']
parameters = ['x', 'y', 'z']
units = ['af', 'cbm']
start = datetime(2000, 01, 01)
end = datetime(2000, 01, 03)

arrays = [parameters * 2, units * 3]

cols = pd.MultiIndex.from_arrays(arrays)
empty_df = pd.DataFrame(index=pd.date_range(start, end), columns=cols).fillna(0.0)

tab_dict = {}
for region in regions:
    tab_dict.update({region: {}})
    for sub_region in sub_regions:
        tab_dict[region].update({sub_region: empty_df})

Which returns

{'region_y':
 {'a':       x    y    z    x    y    z
             af  cbm   af  cbm   af  cbm
2000-01-01  0.0  0.0  0.0  0.0  0.0  0.0
2000-01-02  0.0  0.0  0.0  0.0  0.0  0.0
2000-01-03  0.0  0.0  0.0  0.0  0.0  0.0, 
'c':         x    y    z    x    y    z
             af  cbm   af  cbm   af  cbm
2000-01-01  0.0  0.0  0.0  0.0  0.0  0.0
2000-01-02  0.0  0.0  0.0  0.0  0.0  0.0
2000-01-03  0.0  0.0  0.0  0.0  0.0  0.0,
 'b':        x    y    z    x    y    z
             af  cbm   af  cbm   af  cbm
2000-01-01  0.0  0.0  0.0  0.0  0.0  0.0
2000-01-02  0.0  0.0  0.0  0.0  0.0  0.0
2000-01-03  0.0  0.0  0.0  0.0  0.0  0.0},
 'region_x':
 {'a':       x    y    z    x    y    z
             af  cbm   af  cbm   af  cbm
2000-01-01  0.0  0.0  0.0  0.0  0.0  0.0
2000-01-02  0.0  0.0  0.0  0.0  0.0  0.0
2000-01-03  0.0  0.0  0.0  0.0  0.0  0.0, 
'c':         x    y    z    x    y    z
             af  cbm   af  cbm   af  cbm
2000-01-01  0.0  0.0  0.0  0.0  0.0  0.0
2000-01-02  0.0  0.0  0.0  0.0  0.0  0.0
2000-01-03  0.0  0.0  0.0  0.0  0.0  0.0,
'b':         x    y    z    x    y    z
             af  cbm   af  cbm   af  cbm
2000-01-01  0.0  0.0  0.0  0.0  0.0  0.0
2000-01-02  0.0  0.0  0.0  0.0  0.0  0.0
2000-01-03  0.0  0.0  0.0  0.0  0.0  0.0}}

Now I need to extract a value from each day (using np.random here) and somehow insert that into it's proper place. I have had success getting into a single-nested dict and updating a DataFrame object (using dict_[key].loc[date] = x), but a 'similar' approach here returns SettingWithCopyWarning and does not update the dataframes.

for day in rrule.rrule(rrule.DAILY, dtstart=start, until=end):
    for region in regions:
        for sub_region in sub_regions:
            for parameter in parameters:
                for unit in units:
                    unit_af = np.random.randint(100)
                    unit_cbm = unit_af * 2
                    tab_dict[region][sub_region][parameter]['af'].loc[day] = unit_af
                    tab_dict[region][sub_region][parameter]['cbm'].loc[day] = unit_cbm

It just returns what I had started with. I would greatly appreciate any advice on how to update these values. Excuse the messy code, this was the simplest I could write to reproduce my (much uglier) problem.

Upvotes: 1

Views: 299

Answers (1)

piRSquared
piRSquared

Reputation: 294218

specify both index and column in loc
Try

for day in rrule.rrule(rrule.DAILY, dtstart=start, until=end):
    for region in regions:
        for sub_region in sub_regions:
            for parameter in parameters:
                for unit in units:
                    unit_af = np.random.randint(100)
                    unit_cbm = unit_af * 2
                    tab_dict[region][sub_region][parameter].loc[day, 'af'] = unit_af
                    tab_dict[region][sub_region][parameter].loc[day, 'cbm'] = unit_cbm

Upvotes: 2

Related Questions