How can I create a new series by using specific rows and columns of a pandas data frame?

Question

I am working with a pandas data frame which looks like as follows:

       title         view_count comment_count like_count    dislike_count   dog_tag cat_tag bird_tag other_tag  
0   Great Dane Loves     299094        752.0      15167          58           [dog]    []       []   [] 
1   Guy Loves His Cat    181320       1283.0      13254         262             []  [cat]       []   []

Basically, title represents the name of the YouTube video. If the video is about dogs, you can see [dog] under dog_tag category. If it is not about dogs, you see an empty list [] under dog_tag.

I need to do create a new series which include title, view_count, comment_count, like_count and dislike_count for every row of dog_tag if the value of dog_tag is [dog]. I should not include any information for the rows where the value of dog_tag is [].

So, my new series should seem like this:

       title         view_count comment_count like_count    dislike_count   dog_tag     
0   Great Dane Loves     299094        752.0      15167          58           [dog]     
1   Dogs are Soo Great!! 181320       1283.0      13254         262           [dog]
2   Dog and Little Girl  562585       5658.3      46589         121           [dog]

Is there any genius person who can solve this problem? I tried the following solutions that I found on Stack Overflow but I could not get what I need :(

only_dog = [dodo_data.loc[:, dodo_data.loc[0,:].eq(s)] for s in ['dog_tag', 'view_count', 'comment_count', 'like_count', 'dislike_count','ratio_of_comments_per_view', 'ratio_of_likes_per_view']]

dodo_data.loc[:,dodo_data.iloc[0, :] == "dog_tag"]
dodo_data.loc[:,dodo_data.iloc[0, :] == "view_count"]
dodo_data.loc[:,dodo_data.iloc[0, :] == "comment_count"]

jezrael · Accepted Answer

Because if convert empty list to boolean get False you can use boolean indexing with DataFrame.loc for filter by condition and also by list of columns names:

cols = ['title', 'view_count', 'comment_count', 'like_count', 'dislike_count', 'dog_tag']
df = df.loc[df['dog_tag'].astype(bool), cols]

How can I create a new series by using specific rows and columns of a pandas data frame?

Answers (2)

Related Questions