Reputation: 3
I'm currently using R for data science and I'm learning Python and Pandas to expand my toolkit. I'd like to create a new column of lists within a Pandas dataframe using existing column names and values.
For the following Pandas dataframe:
test1 test2 test3
1 0 1 1
2 0 1 0
3 1 1 1
4 1 0 0
5 0 0 0
A new column would contain a list for each row that takes the column name wherever there's a '1' value, strips the 'test' prefix, and concatenates the list using a '-' separator.
test1 test2 test3 combo
0 0 1 1 2-3
1 0 1 0 2
2 1 1 1 1-2-3
3 1 0 0 1
4 0 0 0
I can create the column in R and data.table using the following code:
df [, combo := apply (df == 1, 1, function(x) {
paste (gsub("test", "", names(which(x))), collapse = "-")
}
)]
This is the closest I've come in Pandas:
def test(x):
paste(loc[x])
df['combo'] = df.apply(test, df == 1, axis = 1)
TypeError: apply() got multiple values for argument 'axis'
Am I on the right path?
Upvotes: 0
Views: 935
Reputation: 19947
You can first rename column and then use apply to extract the column names and then join them.
df['combo'] = (
df.rename(columns=lambda x: x.replace('test',''))
.astype(bool)
.apply(lambda x: '-'.join(x.loc[x].index), axis=1)
)
df
Out[15]:
test1 test2 test3 combo
1 0 1 1 2-3
2 0 1 0 2
3 1 1 1 1-2-3
4 1 0 0 1
5 0 0 0
Upvotes: 0
Reputation: 28233
df['combo'] = df.apply(lambda x: '-'.join(list(x[x == 1].index)).replace('test', ''), axis=1)
produces the following output:
In [8]: df
Out[8]:
test1 test2 test3 combo
0 0 1 1 2-3
1 0 1 0 2
2 1 1 1 1-2-3
3 1 0 0 1
4 0 0 0
The function lambda x: '-'.join(list(x[x == 1].index)).replace('test', '')
picks the index of series elements that are equal to 1. The index for rows are the column names test1, test2, test3
, so after joining the list, it is necessary to replace 'test'
from the resulting string with ''
.
And we need to apply this function along rows, hence we pass axis=1
. The default axis=0
applies the function along columns.
Upvotes: 1