Robert Garrison
Robert Garrison

Reputation: 131

General Script to Concatenate Series of DataFrames

I have a dictionary of over 1000 dataframes (df_dict) where some elements may be series:

Schedule_RCL_03312007

Schedule_RCL_03312008

Schedule_RCL_03312009

Schedule_RCL_03312010(1_of_2)

Schedule_RCL_03312010(2_of_2)

Schedule_RCL_03312011(1_of_2)

Schedule_RCL_03312011(2_of_2)

Schedule_RCL_06302011(1_of_2)

Schedule_RCL_06302011(2_of_2)

How would I selectively horizontally concatenate the serial DataFrames (e.g. (1 of 2) and (2 of 2)) in a general fashion? I know how to perform the function for the specific situation but I need something general for when the data may change over time.

Upvotes: 0

Views: 40

Answers (1)

Stefan
Stefan

Reputation: 42885

You could use itertools.groupby:

from itertools import groupby
from operator import itemgetter
df_names = ['Schedule_RCL_03312007', 'Schedule_RCL_03312008', 'Schedule_RCL_03312009', 'Schedule_RCL_03312010(1_of_2)', 'Schedule_RCL_03312010(2_of_2)', 'Schedule_RCL_03312011(1_of_2)', 'Schedule_RCL_03312011(2_of_2)', 'Schedule_RCL_06302011(1_of_2)', 'Schedule_RCL_06302011(2_of_2)']

for key, grp in groupby(sorted([d.split('(') for d in df_names]), key=itemgetter(0)): # use df_dict.keys() instead of df_names
    frames = ['('.join(f) for f in grp]
    if len(frames) > 1:
        print(key, frames)
        df_dict[key] = pd.concat([df_dict[f] for f in frames], axis=1)
        [df_dict.pop(key, None) for key in frames] # optional: remove serial frames 

which prints:

Schedule_RCL_03312010 ['Schedule_RCL_03312010(1_of_2)', 'Schedule_RCL_03312010(2_of_2)']
Schedule_RCL_03312011 ['Schedule_RCL_03312011(1_of_2)', 'Schedule_RCL_03312011(2_of_2)']
Schedule_RCL_06302011 ['Schedule_RCL_06302011(1_of_2)', 'Schedule_RCL_06302011(2_of_2)']

Since you are using a dict, you could use df_dict.keys() instead.

Upvotes: 2

Related Questions