Reputation: 470
I have a large time series dataset with some of the observations (each with a unique ID) having a different length. I also have a 'Section' column that counts time step or rows for each unique ID.
df.groupby([df['ID']]).agg({'count'})
A B Z
count count ... count
ID
25782 194 194 194
25783 198 198 198
25784 194 194 194
25785 192 192 192
... ... ... ... ...
25787 192 192 192
25788 195 195 195
25789 196 196 196
25790 200 200 200
say I want to create a new dataframe consisting only where the length of unique ID = 192. I.e 'Section' counts up to 192.
So far I have tried the following but for no avail. Please help.
mask = df.groupby('ID')(len(df['Section']) == 192)
df = df.loc[mask]
print(df)
AND
df.groupby('ID').df[df['Section'].max() == 192]
edit
Desired output
new_df.groupby([new_df['ID']]).agg({'count'})
A B Z
count count ... count
ID
25752 192 192 192
25137 192 192 192
25970 192 192 192
25440 192 192 192
Upvotes: 2
Views: 1272
Reputation: 29635
You can use filter
after the groupby
to keep only the ID
where the length of the 'Section' column is 192, such as:
new_df = df.groupby('ID').filter(lambda x: len(x['Section']) == 192)
Then when you do new_df.groupby('ID').agg({'count'})
you should get your expected output
Upvotes: 2