Calling Pandas Data Frames Created with globals() Inside For Loop

Question

I am iterating through 50 files in python and dumping them each into pandas data frames. Then from each data frame I create three new data frames based on the values in a specific field in the original data frame. These three new frames have new names that include the the value they were filtered on.

It works, yay! I get all my data frames!

The problem is, I'm creating these data frames using a global() call, and I do not know how to access them without explicitly typing each individual data frame name into a kernal.

Why do I want to do this, you may ask?

Well, I want to grab all of the data frames that end in 'cd', for example, and append (union all) them into a final data frame. I don't want to have to explicitly call all 50 of them. I want to loop through a list of the data frames to accomplish this task.

Any suggestions on how to accomplish this, or rework the code?

I'm new to these more intensive processes with iPython, so change whatever.

    filelist = os.listdir()
    sum_list = ['CAKE', 'TWINKIES', 'DOUGHNUTS', 'CUPCAKES']
    for f in filelist:
        state = re.match('((\w+){2})\_', f)
        state_df = str(state.group(1)) + '_df'
        data = pd.read_csv(f, low_memory = False)
        df = pd.DataFrame(data)
        for x in sum_list:
            sdo = state_df + '_' + x.lower()
            globals()[sdo] = pd.DataFrame(df.loc[df['summary_level'] == x])

Andy Hayden · Accepted Answer

I think a much better way is to create your own dictionary rather than resort to globals! Just create your own and append to some list or dictionary of lists? (depending on the classification):

dfs = {}
for f in filelist:
    ...
    df = pd.read_csv(f)  # this returns a DataFrame
    for x in sum_list:
        ...
        dfs[sdo] = df[df.summary_level == x]  # again, this return a DataFrame

You could use a default dict, and assign each to a sub dictionary:

from collection import defaultdict
dfs = defaultdict({})
...
        dfs[x][sdo] = ...

i.e. dfs['CAKE'] will be all the CAKE DataFrames.

Calling Pandas Data Frames Created with globals() Inside For Loop

Answers (1)

Related Questions