Colin Minder
Colin Minder

Reputation: 1

How to add a list of column names to a DataFrame in pandas under conditions

I have DataFrame from pandas which shows some words as column names. The value in the cells is the number of times the word is used in a mail (row):

+--------+-------+-------+-------+-------+
| index  | word1 | word2 | word3 | word4 |
+--------+-------+-------+-------+-------+
|      0 |     1 |     2 |     1 |     0 |
|      1 |     2 |     3 |     5 |     1 |
|      2 |     0 |     0 |     1 |     0 |
+--------+-------+-------+-------+-------+

Now i need a list with the words (column names) at the end of every row but only if the word is used. something like this:

+--------+-------+-------+-------+-------+---------------------------+
| index  | word1 | word2 | word3 | word4 |           text            |
+--------+-------+-------+-------+-------+---------------------------+
|      0 |     1 |     2 |     1 |     0 | [word1,word2,word3]       |
|      1 |     2 |     3 |     5 |     1 | [word1,word2,word3,word4] |
|      2 |     0 |     0 |     3 |     0 | [word3]                   |
+--------+-------+-------+-------+-------+---------------------------+

I know i can get a list with list(data.columns) but what I don't get is how to put in conditions and add a new column with the list in it.

Upvotes: 0

Views: 79

Answers (1)

Randy
Randy

Reputation: 14847

In [136]: df = pd.DataFrame(np.random.randint(0, 3, (3, 5)), columns=list('abcde'))

In [137]: df
Out[137]:
   a  b  c  d  e
0  1  0  1  0  1
1  0  2  0  0  2
2  0  1  1  0  0

In [140]: df['text'] = df.apply(lambda x: df.columns[x.astype(bool)].to_list(), axis=1)

In [141]: df
Out[141]:
   a  b  c  d  e       text
0  1  0  1  0  1  [a, c, e]
1  0  2  0  0  2     [b, e]
2  0  1  1  0  0     [b, c]

Upvotes: 1

Related Questions