Adam R. Jensen
Adam R. Jensen

Reputation: 636

How to split Pandas Series into a DataFrame with columns for each hour of day?

I have a Pandas Series of solar radiation values with the index being timestamps with a one minute resolution. E.g.:

index              solar_radiation
2019-01-01 08:01          0
2019-01-01 08:02         10
2019-01-01 08:03         15
...
2019-01-10 23:59          0

I would like to convert this to a table (DataFrame) where each hour is averaged into one column, e.g.:

index           00  01  02  03  04  05  06 ... 23
2019-01-01       0   0   0   0   0   3  10 ... 0
2019-01-02       0   0   0   0   0   4  12 ... 0
....
2019-01-10       0   0   0   0   0   6  24...  0

I have tried to look into Groupby, but there I am only able to group hours into one combined bin and not one for each day... any hints or suggestions as to how I can achive this with groupby or should I just brute force it and iterate over each hour?

Upvotes: 1

Views: 242

Answers (2)

jezrael
jezrael

Reputation: 863256

Solutions for one column DataFrame:

Aggregate mean by DatetimeIndex with DatetimeIndex.floor for remove times and DatetimeIndex.hour, reshape by Series.unstack and add missing values by DataFrame.reindex:

#if necessary
#df.index = pd.to_datetime(df.index)
rng = pd.date_range(df.index.min().floor('D'), df.index.max().floor('D'))
df1 = (df.groupby([df.index.floor('D'), df.index.hour])['solar_radiation']
         .mean()
         .unstack(fill_value=0)
         .reindex(columns=range(0, 24), fill_value=0, index=rng))

Another solution with Grouper by hour, replace missing values to 0 and reshape by Series.unstack:

#if necessary
#df.index = pd.to_datetime(df.index)

df1 = df.groupby(pd.Grouper(freq='H'))[['solar_radiation']].mean().fillna(0)
df1 = df1.set_index([df1.index.date, df1.index.hour])['solar_radiation'].unstack(fill_value=0)
print (df1)
             0    1    2    3    4    5    6    7         8    9   ...   14  \
2019-01-01  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  8.333333  0.0  ...  0.0   
2019-01-02  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.000000  0.0  ...  0.0   
2019-01-03  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.000000  0.0  ...  0.0   
2019-01-04  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.000000  0.0  ...  0.0   
2019-01-05  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.000000  0.0  ...  0.0   
2019-01-06  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.000000  0.0  ...  0.0   
2019-01-07  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.000000  0.0  ...  0.0   
2019-01-08  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.000000  0.0  ...  0.0   
2019-01-09  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.000000  0.0  ...  0.0   
2019-01-10  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.000000  0.0  ...  0.0   

             15   16   17   18   19   20   21   22   23  
2019-01-01  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  
2019-01-02  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  
2019-01-03  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  
2019-01-04  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  
2019-01-05  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  
2019-01-06  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  
2019-01-07  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  
2019-01-08  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  
2019-01-09  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  
2019-01-10  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  

[10 rows x 24 columns]

Solutions for Series with DatetimeIndex:

rng = pd.date_range(df.index.min().floor('D'), df.index.max().floor('D'))
df1 = (df.groupby([df.index.floor('D'), df.index.hour])
         .mean()
         .unstack(fill_value=0)
         .reindex(columns=range(0, 24), fill_value=0, index=rng))

df1 = df.groupby(pd.Grouper(freq='H')).mean().to_frame('new').fillna(0)
df1 = df1.set_index([df1.index.date, df1.index.hour])['new'].unstack(fill_value=0)

Upvotes: 1

Erfan
Erfan

Reputation: 42916

If I understand you correctly, you want to use resample hourly. Then we can make a MultiIndex with date and hour, then we unstack the hour index to columns:

df = df.resample('H').mean()
df.set_index([df.index.date, df.index.time], inplace=True)
df = df.unstack(level=[1])

Which gives us the following output:

print(df)
           solar_radiation                                               \
                  00:00:00 01:00:00 02:00:00 03:00:00 04:00:00 05:00:00   
2019-01-01             NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-02             NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-03             NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-04             NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-05             NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-06             NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-07             NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-08             NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-09             NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-10             NaN      NaN      NaN      NaN      NaN      NaN   

                                                 ...                    \
           06:00:00 07:00:00  08:00:00 09:00:00  ... 14:00:00 15:00:00   
2019-01-01      NaN      NaN  8.333333      NaN  ...      NaN      NaN   
2019-01-02      NaN      NaN       NaN      NaN  ...      NaN      NaN   
2019-01-03      NaN      NaN       NaN      NaN  ...      NaN      NaN   
2019-01-04      NaN      NaN       NaN      NaN  ...      NaN      NaN   
2019-01-05      NaN      NaN       NaN      NaN  ...      NaN      NaN   
2019-01-06      NaN      NaN       NaN      NaN  ...      NaN      NaN   
2019-01-07      NaN      NaN       NaN      NaN  ...      NaN      NaN   
2019-01-08      NaN      NaN       NaN      NaN  ...      NaN      NaN   
2019-01-09      NaN      NaN       NaN      NaN  ...      NaN      NaN   
2019-01-10      NaN      NaN       NaN      NaN  ...      NaN      NaN   

                                                                           \
           16:00:00 17:00:00 18:00:00 19:00:00 20:00:00 21:00:00 22:00:00   
2019-01-01      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-02      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-03      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-04      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-05      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-06      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-07      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-08      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-09      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
2019-01-10      NaN      NaN      NaN      NaN      NaN      NaN      NaN   


           23:00:00  
2019-01-01      NaN  
2019-01-02      NaN  
2019-01-03      NaN  
2019-01-04      NaN  
2019-01-05      NaN  
2019-01-06      NaN  
2019-01-07      NaN  
2019-01-08      NaN  
2019-01-09      NaN  
2019-01-10      0.0  

[10 rows x 24 columns]

Note I got a lot of NaN since you provided only couple of rows data.

Upvotes: 3

Related Questions