Reputation: 175
I have a dataset which looks like this:
venue_id,latitude,longitude,venue_category,country_code,user_id,uct_time,time_offset
4af833a6f964a5205a0b22e3,13.693775,100.751152,Airport,TH,4337,Tue Apr 03 20:35:48 +0000 2012,420
4af833a6f964a5205a0b22e3,13.693775,100.751152,Airport,TH,101773,Tue Apr 03 20:46:53 +0000 2012,420
4af833a6f964a5205a0b22e3,13.693775,100.751152,Airport,TH,105093,Tue Apr 03 22:39:56 +0000 2012,420
4af833a6f964a5205a0b22e3,13.693775,100.751152,Airport,TH,58835,Tue Apr 03 22:54:52 +0000 2012,420
....
and I need to remove the venue_id that have less than 100 occurrences.
I have tried to use the following code:
joined = joined[joined.groupby("venue_id").venue_id.transform(len) >= 100]
which is inspired by the answer from the question with ID 13446480.
The problem is that it gives me the following error:
AttributeError: 'DataFrameGroupBy' object has no attribute 'venue_id'
Please bear in mind that I new to Pandas and I want to learn, so if you could give some explanation as well I would be grateful.
Cheers,
Dan
Upvotes: 2
Views: 38
Reputation: 862691
It seems first column is index, so help reset_index
.
So need:
joined = joined.reset_index()
joined = joined[joined.groupby("venue_id")['venue_id'].transform(len) >= 100]
Also for me works if first column is index and dont need reset_index
:
joined = joined[joined.groupby("venue_id").transform(len) >= 100]
If dont use last versions of pandas (0.20.1
) then is necessary add some column:
joined = joined[joined.groupby(level="venue_id")['latitude'].transform(len) >= 100]
EDIT1:
Faster is use size
as len
.
joined = joined[joined.groupby("venue_id")['latitude'].transform('size') >= 100]
Upvotes: 1