Pandas dataframe how to remove rows conditioned on the length of rows being smaller than a number, given a unique column value?

Question

I have a dataframe that looks something like

     date        id      params 
123  2016-03-02  0A122B  23.7
124  2016-03-03  0A122B  25.5
125  2016-03-04  0A122B  29.7
126  2016-03-07  0A122B  26.4
... 
456  2016-03-02  3B778C  1050
457  2016-03-03  3B778C  1350
458  2016-03-04  3B778C  2900
...
1255 2016-03-02  5D898F  135.88
1256 2016-03-03  5D898F  189.55
1257 2016-03-04  5D898F  205.22
1258 2016-03-07  5D898F  278.35
1259 2016-03-08  5D898F  145.64

For a particular unique id, it has rows of date and also its params. Note that the length of amount of rows of id can be different. For example, 0A122B may only have date data of length 48 and 5D898F maybe instead data of length 1255.

I'd like to know a way to remove rows of data where for a particular id e.g.0A122B , its total amount of rows is less than a number, say 50, for each and everyone of the id.

not_speshal · Accepted Answer

Try with groupby:

output = df[df.groupby("id")["date"].transform("count")>50]

Pandas dataframe how to remove rows conditioned on the length of rows being smaller than a number, given a unique column value?

Answers (1)

Related Questions