A.Abs
A.Abs

Reputation: 470

Collect rows based on unique ID Pandas dataframe

I have a large time series dataset with some of the observations (each with a unique ID) having a different length. I also have a 'Section' column that counts time step or rows for each unique ID.

df.groupby([df['ID']]).agg({'count'})

                                  A             B                         Z
                                 count         count         ...         count   
    ID                                                        
    25782                          194           194                      194   
    25783                          198           198                      198   
    25784                          194           194                      194   
    25785                          192           192                      192   
   ...                             ...           ...          ...          ...
    25787                          192           192                      192   
    25788                          195           195                      195   
    25789                          196           196                      196   
    25790                          200           200                      200   

say I want to create a new dataframe consisting only where the length of unique ID = 192. I.e 'Section' counts up to 192.

So far I have tried the following but for no avail. Please help.

mask = df.groupby('ID')(len(df['Section']) == 192)
df = df.loc[mask]
print(df)

AND

df.groupby('ID').df[df['Section'].max() == 192]

edit

Desired output

new_df.groupby([new_df['ID']]).agg({'count'})

                               A             B                         Z
                             count         count         ...         count   
ID                                                        
25752                          192           192                      192   
25137                          192           192                      192   
25970                          192           192                      192   
25440                          192           192                      192  

Upvotes: 2

Views: 1272

Answers (1)

Ben.T
Ben.T

Reputation: 29635

You can use filter after the groupby to keep only the ID where the length of the 'Section' column is 192, such as:

new_df = df.groupby('ID').filter(lambda x: len(x['Section']) == 192)

Then when you do new_df.groupby('ID').agg({'count'}) you should get your expected output

Upvotes: 2

Related Questions