R. Cox
R. Cox

Reputation: 879

Apply function to Dictionary of Dataframes

I have a dictionary of dataframes, Di_N. How can I apply the same functions to each dataframe please?

The names of the dataframes are generated from the data so these are not defined in the code.

The following code has been edited to use JPP's answer; "Iterate your dictionary keys and modify the dataframe for each key sequentially":

import pandas as pd
import numpy  as np
import copy

# Data
df_1 = pd.DataFrame({'Box' : [1006,1006,1006,1006,1006,1006,1007,1007,1007,1007,1008,1008,1008,1009,1009,1010,1011,1011,1012,1013],
                     'Item': [  40,  41,  42,  43,  44,  45,  40,  43,  44,  45,  43,  44,  45,  40,  41,  40,  44,  45,  44,  45]})


df_Y = pd.DataFrame({'Box' : [1006,1007,1008,1009,1010,1011,1012,1013,1014],
                     'Type': [ 103, 101, 102, 102, 102, 103, 103, 103, 103]})

# Find whether each Box contains each Item
def is_number(s):
    try:
        float(s)
        return 1
    except ValueError:
        return 0
df_1['Thing'] = df_1['Item'].apply(is_number)

# Join
df_N = df_1.set_index('Box').join(df_Y.set_index('Box', 'outer')) # Why isn't Box 1014 in df_N?

# Find how many Boxes there are of each Type
def fun(g):
    try:
        return float(g.shape[0])
    except ZeroDivisionError:
        return np.nan
df_T = df_Y.groupby('Type').apply(fun).to_frame().transpose()

# Map of Box Type
Ma_G = df_N.groupby('Type')

# Group the Boxes by Type
Di_1 = {}
for name, group in Ma_G:
    Di_1[str(name)] = group

Di_2 = copy.deepcopy(Di_1)
Di_3 = {}

# Function to find the Mean of how many times each Item is in a Box
def fun(g):
    try:
        return float(g.shape[0])
    except ZeroDivisionError:
        return np.nan

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']

for k in Di_1:

    # Table of which Item is in which Box
    Di_2[k] = pd.pivot_table(Di_1[k], values='Thing', columns='Item', index=['Box'], aggfunc=np.sum).fillna(0)

    # Find the Mean of how many times each Item is in a Box
    Di_3[k] = Di_1[k] .groupby('Item') .apply(fun)  .to_frame() .transpose()
    Di_3[k] = (Di_3[k].loc[0] / len(Di_1[k].index)) .to_frame() .transpose()

Di_4 = copy.deepcopy(Di_2)
for k in Di_1:

    # Compare each Box to the Mean - is this valid?
    Di_4[k] = pd.DataFrame(Di_2[k].values - Di_3[k].values, columns=Di_2[k].columns, index=Di_2[k].index)

    for c in [c for c in Di_4[k].columns if Di_4[k][c].dtype in numerics]:
        Di_4[k][c] = Di_4[k][c].abs()

    Di_2[k]['Unusualness'] = Di_4[k].sum(axis=1)

Upvotes: 1

Views: 1917

Answers (1)

jpp
jpp

Reputation: 164673

Just iterate your dictionary keys and modify the dataframe for each key sequentially. Here's some pseudo-code to demonstrate how you can do this:

for k in Di_N:
    Di_N[k] = pd.pivot_table(Di_N[k], values='Thing', ...).fillna(0)
    ....
    df_3 = ....
    df_4 = pd.DataFrame(Di_N[k].values - .... )
    Di_N[k]['Unusualness'] = df_4.sum(axis=1)

There are a few bits you don't need to include in your loop, e.g. definition of fun() and numerics. Put these outside your loop, you can still reference these objects within your loop.

In addition, you can use pd.DataFrame.select_dtypes to select numeric columns:

num_cols = df_4.select_dtypes(include=numerics).columns
df_4[num_cols] = df_4[num_cols].abs()

Upvotes: 2

Related Questions