Reputation: 131
I have a dictionary of over 1000 dataframes (df_dict) where some elements may be series:
Schedule_RCL_03312007
Schedule_RCL_03312008
Schedule_RCL_03312009
Schedule_RCL_03312010(1_of_2)
Schedule_RCL_03312010(2_of_2)
Schedule_RCL_03312011(1_of_2)
Schedule_RCL_03312011(2_of_2)
Schedule_RCL_06302011(1_of_2)
Schedule_RCL_06302011(2_of_2)
How would I selectively horizontally concatenate the serial DataFrames (e.g. (1 of 2) and (2 of 2)) in a general fashion? I know how to perform the function for the specific situation but I need something general for when the data may change over time.
Upvotes: 0
Views: 40
Reputation: 42885
You could use itertools.groupby
:
from itertools import groupby
from operator import itemgetter
df_names = ['Schedule_RCL_03312007', 'Schedule_RCL_03312008', 'Schedule_RCL_03312009', 'Schedule_RCL_03312010(1_of_2)', 'Schedule_RCL_03312010(2_of_2)', 'Schedule_RCL_03312011(1_of_2)', 'Schedule_RCL_03312011(2_of_2)', 'Schedule_RCL_06302011(1_of_2)', 'Schedule_RCL_06302011(2_of_2)']
for key, grp in groupby(sorted([d.split('(') for d in df_names]), key=itemgetter(0)): # use df_dict.keys() instead of df_names
frames = ['('.join(f) for f in grp]
if len(frames) > 1:
print(key, frames)
df_dict[key] = pd.concat([df_dict[f] for f in frames], axis=1)
[df_dict.pop(key, None) for key in frames] # optional: remove serial frames
which prints:
Schedule_RCL_03312010 ['Schedule_RCL_03312010(1_of_2)', 'Schedule_RCL_03312010(2_of_2)']
Schedule_RCL_03312011 ['Schedule_RCL_03312011(1_of_2)', 'Schedule_RCL_03312011(2_of_2)']
Schedule_RCL_06302011 ['Schedule_RCL_06302011(1_of_2)', 'Schedule_RCL_06302011(2_of_2)']
Since you are using a dict
, you could use df_dict.keys()
instead.
Upvotes: 2