Reputation: 319
Consider the dataframe "data" indexed by Timestamp as follow:
Index Receiver Type Retry
1970-01-01 00:00:00.000000000 R1 Data 1
1970-01-01 00:00:00.800000000 R1 Non-Data 1
1970-01-01 00:00:01.000287000 R2 Data 0
1970-01-01 00:00:01.600896000 R2 Data 1
1970-01-01 00:00:02.001388000 R1 Data 1
1970-01-01 00:00:02.004698000 R1 Non-Data 1
1970-01-01 00:00:02.006706000 R2 Data 1
1970-01-01 00:00:02.501351000 R2 Data 1
1970-01-01 00:00:02.810382000 R1 Data 0
1970-01-01 00:00:03.001981000 R1 Data 0
1970-01-01 00:00:03.377116000 R1 Data 0
1970-01-01 00:00:03.701811000 R2 Data 1
1970-01-01 00:00:03.910326000 R2 Data 0
1970-01-01 00:00:03.951355000 R2 Non-Data 1
I want to find all rows where Type is "Data" and Retry equals 1 and then group the index in intervals of 1 second to find counts for each Receiver type.
Desired Output as df2:
Index R1 R2
1970-01-01 00:00:00 1 0
1970-01-01 00:00:01 0 1
1970-01-01 00:00:02 1 2
1970-01-01 00:00:03 0 1
Upvotes: 2
Views: 675
Reputation: 402293
I originally set the bounty since I couldn't find the time to help OP, but I was able to find some time and provide 4 options, all of which address OP's question. I've also cleaned up the question to remove redundant/stale information and prevent confusion.
Option 1
Use query
/eval
/boolean indexing, filter rows, and then use get_dummies
+ resample
to get your output -
df = df.query("Type == 'Data' and Retry == 1")\
.set_index('Index').Receiver.str.get_dummies().resample('1S').sum()
df
R1 R2
Index
1970-01-01 00:00:00 1 0
1970-01-01 00:00:01 0 1
1970-01-01 00:00:02 1 2
1970-01-01 00:00:03 0 1
Similar solution using get_dummies
+ groupby
-
df = df.query("Type == 'Data' and Retry == 1").set_index("Index")\
.Receiver.str.get_dummies().groupby(pd.Grouper(freq='s')).sum()
df
R1 R2
Index
1970-01-01 00:00:00 1 0
1970-01-01 00:00:01 0 1
1970-01-01 00:00:02 1 2
1970-01-01 00:00:03 0 1
In fact, resample
and groupby
+ pd.Grouper
are pretty much interchangeable operations.
Option 2
crosstab
+ resample
-
i = df.query("Type == 'Data' and Retry == 1").set_index('Index').Receiver
df = pd.crosstab(i.index, i.values).resample('1S').sum()
df.index.name = df.columns.name = None
df
R1 R2
1970-01-01 00:00:00 1 0
1970-01-01 00:00:01 0 1
1970-01-01 00:00:02 1 2
1970-01-01 00:00:03 0 1
Option 3
groupby
+ unstack
(I presume this to be the slowest one) -
df = df.query("Type == 'Data' and Retry == 1")\
.set_index('Index')\
.groupby([pd.Grouper(freq='1S'), 'Receiver'])\
.Receiver.count()\
.unstack()\
.fillna(0)
df
Receiver R1 R2
Index
1970-01-01 00:00:00 1.0 0.0
1970-01-01 00:00:01 0.0 1.0
1970-01-01 00:00:02 1.0 2.0
1970-01-01 00:00:03 0.0 1.0
Option 4
unstack
+ resample
-
df = df.query("Type == 'Data' and Retry == 1")\
.set_index(['Index', 'Receiver'])\
.assign(foo=1)\
.foo.unstack(-1)\
.resample('s')\
.sum()\
.fillna(0)
df
Receiver R1 R2
Index
1970-01-01 00:00:00 1.0 0.0
1970-01-01 00:00:01 0.0 1.0
1970-01-01 00:00:02 1.0 2.0
1970-01-01 00:00:03 0.0 1.0
Upvotes: 4
Reputation: 30414
Approach 1 -- masking + resample
for r in ['R1','R2']:
df['new'+r] = ((df.Type=='Data')&(df.Retry==1)&(df.Receiver==r)).astype(int)
df[['newR1','newR2']].resample('s').sum()
Approach 2 -- unstacking + resample
df=df[df.Type=='Data'].set_index('Receiver',append=True)
df['Retry'].unstack().resample('s').sum().fillna(0)
Results with either approach are essentially the same, with some minor formatting differences (this is the output from Approach 1):
newR1 newR2
Index
1970-01-01 00:00:00 1 0
1970-01-01 00:00:01 0 1
1970-01-01 00:00:02 1 2
1970-01-01 00:00:03 0 1
Quick discussion of different methods: In many cases, both unstack
and pivot_table
can be used to accomplish similar tasks, as in this case. But pivot_table
(as in @Dark's answer) seems clearly superior in this case (but I'll leave my unstack
answer here solely for comparison purposes).
Upvotes: 2
Reputation: 30605
It can be done with a simple pivot_table
i.e
ndf = df[df['Type'].eq('Data')].pivot_table(columns=['Receiver'],values='Retry',
index=pd.Grouper(key='Index',freq='s'), aggfunc='sum',fill_value=0)
Receiver R1 R2
Index
1970-01-01 00:00:00 1 0
1970-01-01 00:00:01 0 1
1970-01-01 00:00:02 1 2
1970-01-01 00:00:03 0 1
Upvotes: 4
Reputation: 1352
I hope I'm not missing anything in the OP's question. I focused on the very last example data, given under "Last Edit". Using that data, the below code yields the desired output.
Code:
data = '' # <the OP's dataframe here>
def my_grouper(df):
df = df[df['Type'] == 'Data']
return df.groupby(['Receiver'])['Retry'].sum()
grouped_data = data.groupby(pd.Grouper(freq='1S')).apply(my_grouper)
grouped_data = pd.DataFrame(grouped_data).unstack(1).fillna(0)
grouped_data.columns = grouped_data.columns.droplevel(0)
print(grouped_data)
Output:
Receiver R1 R2
Index
1970-01-01 00:00:00 1 0
1970-01-01 00:00:01 0 1
1970-01-01 00:00:02 1 2
1970-01-01 00:00:03 0 1
Upvotes: 2