L J
L J

Reputation: 3

How to create new column of lists in Pandas dataframe using existing column names and values?

I'm currently using R for data science and I'm learning Python and Pandas to expand my toolkit. I'd like to create a new column of lists within a Pandas dataframe using existing column names and values.

For the following Pandas dataframe:

  test1  test2  test3
1      0      1      1
2      0      1      0
3      1      1      1
4      1      0      0
5      0      0      0

A new column would contain a list for each row that takes the column name wherever there's a '1' value, strips the 'test' prefix, and concatenates the list using a '-' separator.

   test1  test2  test3  combo
0      0      1      1    2-3
1      0      1      0      2
2      1      1      1  1-2-3
3      1      0      0      1
4      0      0      0  

I can create the column in R and data.table using the following code:

df [, combo := apply (df == 1, 1, function(x) {
   paste (gsub("test", "", names(which(x))), collapse = "-")
}
)]

This is the closest I've come in Pandas:

def test(x):
    paste(loc[x])

df['combo'] = df.apply(test, df == 1, axis = 1)

TypeError: apply() got multiple values for argument 'axis'

Am I on the right path?

Upvotes: 0

Views: 935

Answers (2)

Allen Qin
Allen Qin

Reputation: 19947

You can first rename column and then use apply to extract the column names and then join them.

df['combo'] = (
   df.rename(columns=lambda x: x.replace('test',''))
   .astype(bool)
   .apply(lambda x: '-'.join(x.loc[x].index), axis=1)
)

df
Out[15]: 
   test1  test2  test3  combo
1      0      1      1    2-3
2      0      1      0      2
3      1      1      1  1-2-3
4      1      0      0      1
5      0      0      0       

Upvotes: 0

Haleemur Ali
Haleemur Ali

Reputation: 28233

df['combo'] = df.apply(lambda x: '-'.join(list(x[x == 1].index)).replace('test', ''), axis=1)

produces the following output:

In [8]: df
Out[8]:
   test1  test2  test3  combo
0      0      1      1    2-3
1      0      1      0      2
2      1      1      1  1-2-3
3      1      0      0      1
4      0      0      0

The function lambda x: '-'.join(list(x[x == 1].index)).replace('test', '') picks the index of series elements that are equal to 1. The index for rows are the column names test1, test2, test3, so after joining the list, it is necessary to replace 'test' from the resulting string with ''.

And we need to apply this function along rows, hence we pass axis=1. The default axis=0 applies the function along columns.

Upvotes: 1

Related Questions