Grouping columns of pandas dataframe in datetime format

Question

I have two questions:

1) Is there something like pandas groupby but applicable on columns (df.columns, not the data within)?

2) How can I extract the "date" from a datetime object?

I have lots of pandas dataframes (or csv files) that have a position column (that I use as index) and then columns of values measured at each position at different time. The column header is a datetime object (or pd.to_datetime). I would like to extract data from the same date and save them into a new file.

Here is a simple example of two such dataframes.

df1:

     2015-03-13 14:37:00  2015-03-13 14:38:00  2015-03-13 14:38:15  \
0.0             24.49393             24.56345             24.50552   
0.5             24.45346             24.54904             24.60773   
1.0             24.46216             24.55267             24.74365   
1.5             24.55414             24.63812             24.80463   
2.0             24.68079             24.76758             24.78552   
2.5             24.79236             24.83005             24.72879   
3.0             24.83691             24.78308             24.66727   
3.5             24.78452             24.73071             24.65085   
4.0             24.65857             24.79398             24.72290   
4.5             24.56390             24.93515             24.83267   
5.0             24.62161             24.96939             24.87366   

     2015-05-19 11:33:00  2015-05-19 11:33:15  2015-05-19 11:33:30  
0.0             8.836121             8.726685             8.710449  
0.5             8.732880             8.742462             8.687408  
1.0             8.881165             8.935120             8.925903  
1.5             9.043396             9.092651             9.204041  
2.0             9.080902             9.153839             9.329681  
2.5             9.128815             9.183777             9.296509  
3.0             9.191254             9.121643             9.207397  
3.5             9.131866             8.975372             9.160248  
4.0             8.966003             8.951813             9.195221  
4.5             8.846924             9.074982             9.264099  
5.0             8.848663             9.101593             9.283081

and df2:

     2015-05-19 11:33:00  2015-05-19 11:33:15  2015-05-19 11:33:30  \
0.0             8.836121             8.726685             8.710449   
0.5             8.732880             8.742462             8.687408   
1.0             8.881165             8.935120             8.925903   
1.5             9.043396             9.092651             9.204041   
2.0             9.080902             9.153839             9.329681   
2.5             9.128815             9.183777             9.296509   
3.0             9.191254             9.121643             9.207397   
3.5             9.131866             8.975372             9.160248   
4.0             8.966003             8.951813             9.195221   
4.5             8.846924             9.074982             9.264099   
5.0             8.848663             9.101593             9.283081   

     2015-05-23 12:25:00  2015-05-23 12:26:00  2015-05-23 12:26:30  
0.0             10.31052            10.132660            10.176910  
0.5             10.26834            10.086910            10.252720  
1.0             10.27393            10.165890            10.276670  
1.5             10.29330            10.219090            10.335910  
2.0             10.24432            10.193940            10.406430  
2.5             10.11618            10.157470            10.323120  
3.0             10.02454            10.110720            10.115360  
3.5             10.08716            10.010680             9.997345  
4.0             10.23868             9.905670            10.008090  
4.5             10.27216             9.879425             9.979645  
5.0             10.10693             9.919800             9.870361

df1 has data from 13 March and 19 May, df2 has data from 19 May and 23 May. From these two dataframes containing data from 3 days, I would like to get 3 dataframes (or csv files or any other object), one for each day. (And for a real-life example, multiply the number of lines, columns and files by some hundred.)

In the worst case I can specify the dates in a separate list, but I am still failing to extract these dates from the dataframes. I did have an idea of a nested loop

for df in dataframes: for d in dates: new_df = df[d]

but I can't get the date from the datetime.

jezrael · Accepted Answer

First concat all DataFrames by columns and then convert groupby object by strftime for string keys of dictionary of DataFrames:

df = pd.concat([df1,df2, dfN], axis=1)
dfs = dict(tuple(df.groupby(df.columns.strftime('%Y-%m-%d'), axis=1)))

#select DataFrame 
print (dfs['2015-03-13'])

Grouping columns of pandas dataframe in datetime format

Answers (1)

Related Questions