Reputation: 731
I have a dataframe, df
, shown below. Each row is a story and each column is a word that appears in the corpus of stories. A 0
means the word is absent in the story while a 1
means the word is present.
I want to find which words are present in each story (i.e. col val == 1). How can I go about finding this (preferably without for-loops)?
Thanks!
Upvotes: 1
Views: 1517
Reputation: 16683
Assuming you are just trying to look at one story, you can filter for the story (let's say story 34972) and transpose the dataframe with:
df_34972 = df[df.index=34972].T
and then you can send the values equal to 1
to a list:
[*df_34972[df_34972['df_34972'] == 1]]
If you are trying to do this for all stories, then you can do this, but it will be a slightly different technique. From the link that SammyWemmy provided, you can melt()
the dataframe and filter for 1
values for each story. From there you could .groupby('story_column')
which is 'index' (after using reset_index()
) in the example below:
df = df.reset_index().melt(id_vars='index')
df = df[df['values'] == 1]
df.groupby('index')['variable'].apply(list)
Upvotes: 1