Reputation: 4498
In a python pandas dataframe "df", I have the following columns:
user_id | song_id | song_duration | song_title | artist | listen_count
Many users might have listened to the same song - therefore the song is not unique in this table. I would like to create a second dataframe with just song information (with unique song_ids).
song_id | song_title | artist
I manage to create a table with song_id and song_title.
song_df = df.groupby('song_id').song_title.first()
How can I add, the column "artist" into this?
This doesn't work:
song_df = df.groupby('song_id').df['song_title','artist'].first()
AttributeError: 'DataFrameGroupBy' object has no attribute 'df'
Upvotes: 3
Views: 3357
Reputation: 732
You could just drop the duplicates of selected columns
song_df = df[['song_id','song_title','artist']].drop_duplicates()
Upvotes: 0
Reputation: 863501
IIUC try omit .df
:
df.groupby('song_id')['song_title','artist'].first()
Upvotes: 1