Reputation: 105
I am working with a pandas data frame which looks like as follows:
title view_count comment_count like_count dislike_count dog_tag cat_tag bird_tag other_tag
0 Great Dane Loves 299094 752.0 15167 58 [dog] [] [] []
1 Guy Loves His Cat 181320 1283.0 13254 262 [] [cat] [] []
Basically, title represents the name of the YouTube video. If the video is about dogs, you can see [dog] under dog_tag category. If it is not about dogs, you see an empty list [] under dog_tag.
I need to do create a new series which include title, view_count, comment_count, like_count and dislike_count for every row of dog_tag if the value of dog_tag is [dog]. I should not include any information for the rows where the value of dog_tag is [].
So, my new series should seem like this:
title view_count comment_count like_count dislike_count dog_tag
0 Great Dane Loves 299094 752.0 15167 58 [dog]
1 Dogs are Soo Great!! 181320 1283.0 13254 262 [dog]
2 Dog and Little Girl 562585 5658.3 46589 121 [dog]
Is there any genius person who can solve this problem? I tried the following solutions that I found on Stack Overflow but I could not get what I need :(
only_dog = [dodo_data.loc[:, dodo_data.loc[0,:].eq(s)] for s in ['dog_tag', 'view_count', 'comment_count', 'like_count', 'dislike_count','ratio_of_comments_per_view', 'ratio_of_likes_per_view']]
dodo_data.loc[:,dodo_data.iloc[0, :] == "dog_tag"]
dodo_data.loc[:,dodo_data.iloc[0, :] == "view_count"]
dodo_data.loc[:,dodo_data.iloc[0, :] == "comment_count"]
Upvotes: 2
Views: 64
Reputation: 7204
You can try this:
import io
dff=io.StringIO("""title,view_count,comment_count,like_count,dislike_count,dog_tag,cat_tag,bird_tag,other_tag
Great Dane Loves,299094,752.0,15167,58,[dog],[],[],[]
Guy Loves His Cat,181320,1283.0,13254,262,[],[cat],[],[]""")
df2=pd.read_csv(dff)
df2 = df2[df2['dog_tag'] == '[dog]']
df2 = df2[df2.columns.drop(list(df2.filter(regex=(r'_tag(?<!dog_tag)'))))]
Upvotes: -1
Reputation: 862611
Because if convert empty list to boolean get False
you can use boolean indexing
with DataFrame.loc
for filter by condition and also by list of columns names:
cols = ['title', 'view_count', 'comment_count', 'like_count', 'dislike_count', 'dog_tag']
df = df.loc[df['dog_tag'].astype(bool), cols]
Upvotes: 3