Mark K
Mark K

Reputation: 9348

Dataframe time series transpose

A data frame as below, I want to resample them by using 'time', 'cars', and 'flow' and transpose them into columns.

enter image description here

So I need to:

  1. add rows of missing minutes, from 09:00 to 10:00, per minute
  2. resample it every 30-minute

The final result should look like: enter image description here

There are steps behind and I tried my best but it only ends up with:

data = {'time': ["9:17",    "9:17", "9:20", "9:21", "9:22", "9:23", "9:23", "9:25", "9:26", "9:27", "9:36", "9:36"], 
'tempreture': [37.7,    37.12,  37.7,   37.11,  37.9,   37.1,   37.8,   37.2,   37.6,   37.7,   37.1,   37.3],
'cars' : [737,  22, 42, 1,  45, 13, 47, 304,    10, 192,    5,  16],
'flow': ["In","Out","In","Unknown","Out","In","Out","In","Unknown","Out","In","In"]}
df = pd.DataFrame(data)

df = df[['time', 'cars', 'flow']]

idx = pd.date_range("9:00", "10:00", freq="1min")
idx = idx.rename('time')

df.index = pd.to_datetime(df.index)
df = df.reindex(idx, fill_value=0)

df = df.pivot(columns='flow').resample('30T').sum()
df = df.reset_index()

df = df.set_index('time')
df.index = pd.to_datetime(df.index)
df = df.stack().T
df["Total"] = df.sum(axis=1)

print (df)


time 2020-10-21 09:00:00 2020-10-21 09:30:00 2020-10-21 10:00:00 Total
flow                   0                   0                   0      
cars                   0                   0                   0     0

What's the right way to write it?

Upvotes: 1

Views: 847

Answers (1)

jezrael
jezrael

Reputation: 862641

Use DataFrame.pivot_table with aggregate sum, then DataFrame.reindex with DataFrame.resample and last convert to one row DataFrame by DataFrame.stack, Series.to_frame and transpose with flatten MultiIndex:

df = df[['time', 'cars', 'flow']]


df = df.pivot_table(index='time',columns='flow', values='cars', aggfunc='sum')
df.index = pd.to_datetime(df.index + ':00')
idx = pd.date_range("9:00", "10:00", freq="1min", name='time')

df = df.reindex(idx, fill_value=0).resample('30T').sum()
df['Total'] = df.sum(axis=1)
df.index = df.index.strftime('%H:%M')
df = df.stack().to_frame().T
df.columns = df.columns.map(lambda x: f'{x[0]} {x[1]}')
print (df)
   09:00 In  09:00 Out  09:00 Unknown  09:00 Total  09:30 In  09:30 Out  \
0    1096.0      306.0           11.0       1413.0      21.0        0.0   

   09:30 Unknown  09:30 Total  10:00 In  10:00 Out  10:00 Unknown  10:00 Total  
0            0.0         21.0       0.0        0.0            0.0          0.0  

But it seems reindex here is not necessary:

df = df.pivot_table(index='time',columns='flow', values='cars', aggfunc='sum')
df.index = pd.to_datetime(df.index + ':00')

df = df.resample('30T').sum()
df['Total'] = df.sum(axis=1)
df.index = df.index.strftime('%H:%M')
df = df.stack().to_frame().T
df.columns = df.columns.map(lambda x: f'{x[0]} {x[1]}')

Upvotes: 1

Related Questions