Reputation: 47
I cant seem to figure it out.I need to group the columns with value 1 into a new column "New_column". Can someone help me. This is the code that I have tried.
for (index_label, row_series) in data.iterrows():
print('Row Index label : ', index_label)
print('Row Content as Series : ', row_series.values)
Upvotes: 1
Views: 1001
Reputation: 28644
You can achieve it with dot :
df = pd.DataFrame(
{
'A': [0,0,1],
'B': [1,0,0],
'C': [0,0,0,],
'D': [1,0,1],
'F': [1,0,1]
}
)
df['new_column'] = df.dot(df.columns).str.join(",")
A B C D F new_column
0 0 1 0 1 1 B,D,F
1 0 0 0 0 0
2 1 0 0 1 1 A,D,F
Update: For columns with more than one letter, @BEN_YO suggested a very good solution for that :
df.dot(df.columns+',').str[:-1]
Upvotes: 4
Reputation: 862551
If columns names has more like one character use DataFrame.dot
with add separator to columns names and last remove from right side by Series.str.rstrip
:
df['new_column'] = df.dot(df.columns + ',').str.rstrip(",")
#alternative
#df['new_column'] = (df @ (df.columns + ',')).str.rstrip(",")
print (df)
A B C D F new_column
0 0 1 0 1 1 B,D,F
1 0 0 0 0 0
2 1 0 0 1 1 A,D,F
df = pd.DataFrame({
'col1': [0,0,1],
'col2': [1,0,0],
'col3': [0,0,0,],
'col4': [1,0,1],
'col5': [1,0,1]})
df['new_column'] = df.dot(df.columns + ',').str.rstrip(",")
#alternative
#df['new_column'] = (df @ (df.columns + ',')).str.rstrip(",")
print (df)
col1 col2 col3 col4 col5 new_column
0 0 1 0 1 1 col2,col4,col5
1 0 0 0 0 0
2 1 0 0 1 1 col1,col4,col5
Alternative solution:
cols = df.columns.to_numpy()
df["new_column"] = [', '.join(cols[x]) for x in df.to_numpy().astype(bool)]
Performance:
First solution from sammywemmy
cannot be used, because 50 columns, so there are some columns with 2 or more letters. Also solution of footfalcon
create lists, so also dont tested.
df = pd.DataFrame({
'A': [0,0,1],
'B': [1,0,0],
'C': [0,0,0,],
'D': [1,0,1],
'E': [1,0,1]})
[30000 rows x 50 columns]
df = pd.concat([df] * 10, ignore_index=True, axis=1)
df = pd.concat([df] * 10000, ignore_index=True).add_prefix('col')
Fastest is list comprehension solution, but only for 10ms in sample data, then are really fast dot
solution and last are apply
solutions:
In [70]: %%timeit
...: cols = df.columns.to_numpy()
...: df["new_column"] = [', '.join(cols[x]) for x in df.to_numpy().astype(bool)]
...:
128 ms ± 443 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
#for testing are values converted to boolean (else test fail)
In [72]: %timeit df['new_column'] = df.astype(bool).dot(df.columns + ',').str.rstrip(",")
138 ms ± 1.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#Dishin H Goyani
In [73]: %timeit df["New_column"] = df.apply(lambda x: ','.join(df.columns[x==1]), axis=1)
3.98 s ± 129 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Akshay Sehgal
In [75]: %timeit df['new_column'] = df.apply(lambda x: ', '.join(list(x[x!=0].index)), axis=1)
11 s ± 349 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Rajith Thennakoon
In [78]: %%timeit
...: df["new_column"] = df.apply(lambda x: (pd.DataFrame(x[x==1]).index.values),axis=1)
...: df["new_column"] = df["new_column"].apply(lambda x: ','.join(map(str, x)))
...:
...:
25.9 s ± 709 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 1
Reputation: 19322
If you have python >= 3.5, you can use the matmul operator to do a dot product as -
df['new_column'] = (df @ df.columns).str.join(', ')
A B C D E new_column
0 0 1 0 1 1 B, D, E
1 0 0 0 0 0
2 1 0 0 1 1 A, D, E
Or you could use apply
to solve this over axis=1 as following -
df['new_column'] = df.apply(lambda x: ', '.join(list(x[x!=0].index)), axis=1)
A B C D E new_column
0 0 1 0 1 1 B, D, E
1 0 0 0 0 0
2 1 0 0 1 1 A, D, E
Upvotes: 0
Reputation: 7693
You can use apply
with lambda
function on axis=1
df["New_column"] = df.apply(lambda x: ','.join(df.columns[x==1]), axis=1)
df
A B C D F New_column
0 0 1 0 1 1 B,D,F
1 0 0 0 0 0
2 1 0 0 1 1 A,D,F
Upvotes: 0
Reputation: 4130
Try this method.
df = pd.DataFrame({"A":[0,0,1],"B":[1,0,0],"C":[0,0,0],"D":[1,0,1],"F":[1,0,1]})
df["new_column"] = df.apply(lambda x: (pd.DataFrame(x[x==1]).index.values),axis=1)
df["new_column"] = df["new_column"].apply(lambda x: ','.join(map(str, x)))
output
A B C D F new_column
0 0 1 0 1 1 B,D,F
1 0 0 0 0 0
2 1 0 0 1 1 A,D,F
Upvotes: 0
Reputation: 619
Not sure if this is the best solution, but it gets the job done:
import pandas as pd
df = pd.DataFrame(
{
'A': [0,0,1],
'B': [1,0,0],
'C': [0,0,0,],
'D': [1,0,1],
'F': [1,0,1]
}
)
df1 = df.T
new_cells = []
for c in df1.columns:
new_cells.append(df1[df1[c] == 1].index.tolist())
df['New_column'] = new_cells
Output:
A B C D F New_column
0 0 1 0 1 1 [B, D, F]
1 0 0 0 0 0 []
2 1 0 0 1 1 [A, D, F]
Upvotes: 0