yasi
yasi

Reputation: 547

How to remove one dictionary from dataframe

I have the following dataframe:

df.head()

And I made dictionaries from each unique appId as you see below:

one dict

with this command:

dfs = dict(tuple(timeseries.groupby('appId')))

After that I want to remove all dictionaries which have less than 30 rows from my dataframe. I removed those dictionaries from my dictionaries(dfs) and then I tried this code:

pd.concat([dfs]).drop_duplicates(keep=False)

but it doesn't work.

Upvotes: 0

Views: 1001

Answers (1)

jezrael
jezrael

Reputation: 862601

I believe you need transform size and then filter by boolean indexing:

df = pd.concat([dfs])
df = df[df.groupby('appId')['appId'].transform('size') >= 30]
#alternative 1
#df = df[df.groupby('appId')['appId'].transform('size').ge(30)]
#alternative 2 (slowier in large data)
#df = df.groupby('appId').filter(lambda x: len(x) >= 30)

Another approach is filter dictionary:

dfs = {k: v for k, v in dfs.items() if len(v) >= 30}

EDIT:

 timeseries = timeseries[timeseries.groupby('appId')['appId'].transform('size') >= 30] 
 dfs = dict(tuple(timeseries.groupby('appId')))

Upvotes: 1

Related Questions