Reyhan
Reyhan

Reputation: 105

How can I create a new series by using specific rows and columns of a pandas data frame?

I am working with a pandas data frame which looks like as follows:

       title         view_count comment_count like_count    dislike_count   dog_tag cat_tag bird_tag other_tag  
0   Great Dane Loves     299094        752.0      15167          58           [dog]    []       []   [] 
1   Guy Loves His Cat    181320       1283.0      13254         262             []  [cat]       []   []

Basically, title represents the name of the YouTube video. If the video is about dogs, you can see [dog] under dog_tag category. If it is not about dogs, you see an empty list [] under dog_tag.

I need to do create a new series which include title, view_count, comment_count, like_count and dislike_count for every row of dog_tag if the value of dog_tag is [dog]. I should not include any information for the rows where the value of dog_tag is [].

So, my new series should seem like this:

       title         view_count comment_count like_count    dislike_count   dog_tag     
0   Great Dane Loves     299094        752.0      15167          58           [dog]     
1   Dogs are Soo Great!! 181320       1283.0      13254         262           [dog]
2   Dog and Little Girl  562585       5658.3      46589         121           [dog]

Is there any genius person who can solve this problem? I tried the following solutions that I found on Stack Overflow but I could not get what I need :(

only_dog = [dodo_data.loc[:, dodo_data.loc[0,:].eq(s)] for s in ['dog_tag', 'view_count', 'comment_count', 'like_count', 'dislike_count','ratio_of_comments_per_view', 'ratio_of_likes_per_view']]

dodo_data.loc[:,dodo_data.iloc[0, :] == "dog_tag"]
dodo_data.loc[:,dodo_data.iloc[0, :] == "view_count"]
dodo_data.loc[:,dodo_data.iloc[0, :] == "comment_count"]

Upvotes: 2

Views: 64

Answers (2)

oppressionslayer
oppressionslayer

Reputation: 7204

You can try this:

import io
dff=io.StringIO("""title,view_count,comment_count,like_count,dislike_count,dog_tag,cat_tag,bird_tag,other_tag 
Great Dane Loves,299094,752.0,15167,58,[dog],[],[],[] 
Guy Loves His Cat,181320,1283.0,13254,262,[],[cat],[],[]""")  

df2=pd.read_csv(dff)

df2 = df2[df2['dog_tag'] == '[dog]']
df2 = df2[df2.columns.drop(list(df2.filter(regex=(r'_tag(?<!dog_tag)'))))] 

Upvotes: -1

jezrael
jezrael

Reputation: 862611

Because if convert empty list to boolean get False you can use boolean indexing with DataFrame.loc for filter by condition and also by list of columns names:

cols = ['title', 'view_count', 'comment_count', 'like_count', 'dislike_count', 'dog_tag']
df = df.loc[df['dog_tag'].astype(bool), cols]

Upvotes: 3

Related Questions