Reputation: 1
I have DataFrame
from pandas which shows some words as column names. The value in the cells is the number of times the word is used in a mail (row):
+--------+-------+-------+-------+-------+
| index | word1 | word2 | word3 | word4 |
+--------+-------+-------+-------+-------+
| 0 | 1 | 2 | 1 | 0 |
| 1 | 2 | 3 | 5 | 1 |
| 2 | 0 | 0 | 1 | 0 |
+--------+-------+-------+-------+-------+
Now i need a list with the words (column names) at the end of every row but only if the word is used. something like this:
+--------+-------+-------+-------+-------+---------------------------+
| index | word1 | word2 | word3 | word4 | text |
+--------+-------+-------+-------+-------+---------------------------+
| 0 | 1 | 2 | 1 | 0 | [word1,word2,word3] |
| 1 | 2 | 3 | 5 | 1 | [word1,word2,word3,word4] |
| 2 | 0 | 0 | 3 | 0 | [word3] |
+--------+-------+-------+-------+-------+---------------------------+
I know i can get a list with list(data.columns)
but what I don't get is how to put in conditions and add a new column with the list in it.
Upvotes: 0
Views: 79
Reputation: 14847
In [136]: df = pd.DataFrame(np.random.randint(0, 3, (3, 5)), columns=list('abcde'))
In [137]: df
Out[137]:
a b c d e
0 1 0 1 0 1
1 0 2 0 0 2
2 0 1 1 0 0
In [140]: df['text'] = df.apply(lambda x: df.columns[x.astype(bool)].to_list(), axis=1)
In [141]: df
Out[141]:
a b c d e text
0 1 0 1 0 1 [a, c, e]
1 0 2 0 0 2 [b, e]
2 0 1 1 0 0 [b, c]
Upvotes: 1