Reputation: 13
I have a log file with timestamps and two columns. I would now like to resample and "pivot" the dateframe created from the logfile.
Example orig dataframe/log file:
timestamp colA colB
2015-01-01 00:10:01 a x
2014-01-01 00:10:01 b y
2015-01-01 00:10:03 a x
2015-01-01 00:10:03 a x
2015-01-01 00:10:03 a y
2015-01-01 00:10:04 b x
2014-01-01 00:10:04 b y
2014-01-01 00:10:04 b y
2014-01-01 00:10:04 a x
2014-01-01 00:10:05 a x
2014-01-01 00:10:05 a x
2014-01-01 00:10:07 a y
2014-01-01 00:10:08 a x
Example result for resampling by seconds:
a b
timestamp x y x y
2015-01-01 00:10:01 1 0 0 1
2015-01-01 00:10:02 0 0 0 0
2015-01-01 00:10:03 2 1 0 0
2015-01-01 00:10:04 1 0 1 2
2014-01-01 00:10:05 2 0 0 0
2014-01-01 00:10:06 0 0 0 0
2014-01-01 00:10:07 0 1 0 0
2014-01-01 00:10:08 1 0 0 0
How would I achieve this? Resample first, then groupby/pivot? Or the other way around? To be more specific I the cells should contain the count for colA/colB combinations for each specific resample timeintervall. In the example seconds, but it could be minutes, hours, etc.
I am not fixed on this format, I could also think of getting a result that is resampled and groupby timestamp/colA like
colB
timestamp colA x y
2015-01-01 00:10:01 a 1 0
b 0 1
2015-01-01 00:10:02 a 0 0
b 0 0
2015-01-01 00:10:03 a 2 1
b 0 0
2015-01-01 00:10:04 a 1 0
b 1 2
2014-01-01 00:10:05 a 2 0
b 0 0
2014-01-01 00:10:06 a 0 0
b 0 0
2014-01-01 00:10:07 a 0 1
b 0 0
2014-01-01 00:10:08 a 1 0
b 0 0
The final usage would be to plot the different count values
THX.
Upvotes: 1
Views: 1037
Reputation: 879501
You could use pd.crosstab
:
import numpy as np
import pandas as pd
df = pd.read_table('data', sep='\s{2,}', parse_dates=[0])
table = pd.crosstab(index=[df['timestamp']], columns=[df['colA'], df['colB']])
yields
colA a b
colB x y x y
timestamp
2014-01-01 00:10:01 0 0 0 1
2014-01-01 00:10:04 1 0 0 2
2014-01-01 00:10:05 2 0 0 0
2014-01-01 00:10:07 0 1 0 0
2014-01-01 00:10:08 1 0 0 0
2015-01-01 00:10:01 1 0 0 0
2015-01-01 00:10:03 2 1 0 0
2015-01-01 00:10:04 0 0 1 0
Upvotes: 1