jo_
jo_

Reputation: 731

Find which columns contain a certain value for each row in a dataframe

I have a dataframe, df, shown below. Each row is a story and each column is a word that appears in the corpus of stories. A 0 means the word is absent in the story while a 1 means the word is present.

enter image description here

I want to find which words are present in each story (i.e. col val == 1). How can I go about finding this (preferably without for-loops)?

Thanks!

Upvotes: 1

Views: 1517

Answers (1)

David Erickson
David Erickson

Reputation: 16683

Assuming you are just trying to look at one story, you can filter for the story (let's say story 34972) and transpose the dataframe with:

df_34972 = df[df.index=34972].T

and then you can send the values equal to 1 to a list:

[*df_34972[df_34972['df_34972'] == 1]]

If you are trying to do this for all stories, then you can do this, but it will be a slightly different technique. From the link that SammyWemmy provided, you can melt() the dataframe and filter for 1 values for each story. From there you could .groupby('story_column') which is 'index' (after using reset_index()) in the example below:

df = df.reset_index().melt(id_vars='index')
df = df[df['values'] == 1]
df.groupby('index')['variable'].apply(list)

Upvotes: 1

Related Questions