Reputation: 319

Using pandas' TimeGrouper() to find column counts by intervals of 1 second

Consider the dataframe "data" indexed by Timestamp as follow:

Index                                Receiver     Type         Retry
1970-01-01 00:00:00.000000000         R1          Data         1
1970-01-01 00:00:00.800000000         R1          Non-Data     1
1970-01-01 00:00:01.000287000         R2          Data         0
1970-01-01 00:00:01.600896000         R2          Data         1
1970-01-01 00:00:02.001388000         R1          Data         1
1970-01-01 00:00:02.004698000         R1          Non-Data     1
1970-01-01 00:00:02.006706000         R2          Data         1
1970-01-01 00:00:02.501351000         R2          Data         1
1970-01-01 00:00:02.810382000         R1          Data         0
1970-01-01 00:00:03.001981000         R1          Data         0
1970-01-01 00:00:03.377116000         R1          Data         0
1970-01-01 00:00:03.701811000         R2          Data         1
1970-01-01 00:00:03.910326000         R2          Data         0
1970-01-01 00:00:03.951355000         R2          Non-Data     1

I want to find all rows where Type is "Data" and Retry equals 1 and then group the index in intervals of 1 second to find counts for each Receiver type.

Desired Output as df2:

Index                        R1   R2
1970-01-01 00:00:00          1    0
1970-01-01 00:00:01          0    1
1970-01-01 00:00:02          1    2
1970-01-01 00:00:03          0    1

Upvotes: 2

Answers (4)

cs95

Reputation: 402293

I originally set the bounty since I couldn't find the time to help OP, but I was able to find some time and provide 4 options, all of which address OP's question. I've also cleaned up the question to remove redundant/stale information and prevent confusion.

Option 1
Use query/eval/boolean indexing, filter rows, and then use get_dummies + resample to get your output -

df = df.query("Type == 'Data' and Retry == 1")\
      .set_index('Index').Receiver.str.get_dummies().resample('1S').sum()

df

                     R1  R2
Index                      
1970-01-01 00:00:00   1   0
1970-01-01 00:00:01   0   1
1970-01-01 00:00:02   1   2
1970-01-01 00:00:03   0   1

Similar solution using get_dummies + groupby -

df = df.query("Type == 'Data' and Retry == 1").set_index("Index")\
         .Receiver.str.get_dummies().groupby(pd.Grouper(freq='s')).sum()

df
                     R1  R2
Index                      
1970-01-01 00:00:00   1   0
1970-01-01 00:00:01   0   1
1970-01-01 00:00:02   1   2
1970-01-01 00:00:03   0   1

In fact, resample and groupby + pd.Grouper are pretty much interchangeable operations.

Option 2
crosstab + resample -

i = df.query("Type == 'Data' and Retry == 1").set_index('Index').Receiver

df = pd.crosstab(i.index, i.values).resample('1S').sum()
df.index.name = df.columns.name = None

df
                     R1  R2                     
1970-01-01 00:00:00   1   0
1970-01-01 00:00:01   0   1
1970-01-01 00:00:02   1   2
1970-01-01 00:00:03   0   1

Option 3
groupby + unstack (I presume this to be the slowest one) -

df = df.query("Type == 'Data' and Retry == 1")\
       .set_index('Index')\
       .groupby([pd.Grouper(freq='1S'), 'Receiver'])\
       .Receiver.count()\
       .unstack()\
       .fillna(0)

df

Receiver              R1   R2
Index                        
1970-01-01 00:00:00  1.0  0.0
1970-01-01 00:00:01  0.0  1.0
1970-01-01 00:00:02  1.0  2.0
1970-01-01 00:00:03  0.0  1.0

Option 4

unstack + resample -

df = df.query("Type == 'Data' and Retry == 1")\
       .set_index(['Index', 'Receiver'])\
       .assign(foo=1)\
       .foo.unstack(-1)\
       .resample('s')\
       .sum()\
       .fillna(0)

df

Receiver              R1   R2
Index                        
1970-01-01 00:00:00  1.0  0.0
1970-01-01 00:00:01  0.0  1.0
1970-01-01 00:00:02  1.0  2.0
1970-01-01 00:00:03  0.0  1.0

Upvotes: 4

JohnE

Reputation: 30414

Approach 1 -- masking + resample

for r in ['R1','R2']:
    df['new'+r] = ((df.Type=='Data')&(df.Retry==1)&(df.Receiver==r)).astype(int)

df[['newR1','newR2']].resample('s').sum()

Approach 2 -- unstacking + resample

df=df[df.Type=='Data'].set_index('Receiver',append=True)
df['Retry'].unstack().resample('s').sum().fillna(0)

Results with either approach are essentially the same, with some minor formatting differences (this is the output from Approach 1):

                     newR1  newR2
Index                            
1970-01-01 00:00:00      1      0
1970-01-01 00:00:01      0      1
1970-01-01 00:00:02      1      2
1970-01-01 00:00:03      0      1

Quick discussion of different methods: In many cases, both unstack and pivot_table can be used to accomplish similar tasks, as in this case. But pivot_table (as in @Dark's answer) seems clearly superior in this case (but I'll leave my unstack answer here solely for comparison purposes).

Upvotes: 2

Bharath M Shetty

Reputation: 30605

It can be done with a simple pivot_table i.e

ndf = df[df['Type'].eq('Data')].pivot_table(columns=['Receiver'],values='Retry',
             index=pd.Grouper(key='Index',freq='s'), aggfunc='sum',fill_value=0)

Receiver             R1  R2
Index                      
1970-01-01 00:00:00   1   0
1970-01-01 00:00:01   0   1
1970-01-01 00:00:02   1   2
1970-01-01 00:00:03   0   1

Upvotes: 4

Ido S

Reputation: 1352

I hope I'm not missing anything in the OP's question. I focused on the very last example data, given under "Last Edit". Using that data, the below code yields the desired output.

Code:

data = '' # <the OP's dataframe here>

def my_grouper(df):
    df = df[df['Type'] == 'Data']
    return df.groupby(['Receiver'])['Retry'].sum()


grouped_data = data.groupby(pd.Grouper(freq='1S')).apply(my_grouper)
grouped_data = pd.DataFrame(grouped_data).unstack(1).fillna(0)
grouped_data.columns = grouped_data.columns.droplevel(0)
print(grouped_data)

Output:

Receiver             R1  R2
Index                      
1970-01-01 00:00:00   1   0
1970-01-01 00:00:01   0   1
1970-01-01 00:00:02   1   2
1970-01-01 00:00:03   0   1

Upvotes: 2

Using pandas&#39; TimeGrouper() to find column counts by intervals of 1 second

Answers (4)

Related Questions

Using pandas' TimeGrouper() to find column counts by intervals of 1 second