Suresh Resh
Suresh Resh

Reputation: 47

How to loop through pandas and match a condition

Input table : Input table

Output table : Output table

I cant seem to figure it out.I need to group the columns with value 1 into a new column "New_column". Can someone help me. This is the code that I have tried.

for (index_label, row_series) in data.iterrows():
print('Row Index label : ', index_label)
print('Row Content as Series : ', row_series.values)

Upvotes: 1

Views: 1001

Answers (6)

sammywemmy
sammywemmy

Reputation: 28644

You can achieve it with dot :

    df = pd.DataFrame(
    {
        'A': [0,0,1],
        'B': [1,0,0],
        'C': [0,0,0,],
        'D': [1,0,1],
        'F': [1,0,1]
    }
)
df['new_column'] = df.dot(df.columns).str.join(",")



    A   B   C   D   F   new_column
0   0   1   0   1   1   B,D,F
1   0   0   0   0   0   
2   1   0   0   1   1   A,D,F

Update: For columns with more than one letter, @BEN_YO suggested a very good solution for that :

df.dot(df.columns+',').str[:-1]

Upvotes: 4

jezrael
jezrael

Reputation: 862551

If columns names has more like one character use DataFrame.dot with add separator to columns names and last remove from right side by Series.str.rstrip:

df['new_column'] = df.dot(df.columns + ',').str.rstrip(",")
#alternative
#df['new_column'] = (df @ (df.columns + ',')).str.rstrip(",")

print (df)
   A  B  C  D  F new_column
0  0  1  0  1  1      B,D,F
1  0  0  0  0  0           
2  1  0  0  1  1      A,D,F

df = pd.DataFrame({
        'col1': [0,0,1],
        'col2': [1,0,0],
        'col3': [0,0,0,],
        'col4': [1,0,1],
        'col5': [1,0,1]})


df['new_column'] = df.dot(df.columns + ',').str.rstrip(",")
#alternative
#df['new_column'] = (df @ (df.columns + ',')).str.rstrip(",")

print (df)
   col1  col2  col3  col4  col5      new_column
0     0     1     0     1     1  col2,col4,col5
1     0     0     0     0     0                
2     1     0     0     1     1  col1,col4,col5

Alternative solution:

cols = df.columns.to_numpy()
df["new_column"] = [', '.join(cols[x]) for x in df.to_numpy().astype(bool)]

Performance:

First solution from sammywemmy cannot be used, because 50 columns, so there are some columns with 2 or more letters. Also solution of footfalcon create lists, so also dont tested.

df = pd.DataFrame({
        'A': [0,0,1],
        'B': [1,0,0],
        'C': [0,0,0,],
        'D': [1,0,1],
        'E': [1,0,1]})

[30000 rows x 50 columns]
df = pd.concat([df] * 10, ignore_index=True, axis=1)
df = pd.concat([df] * 10000, ignore_index=True).add_prefix('col')

Fastest is list comprehension solution, but only for 10ms in sample data, then are really fast dot solution and last are apply solutions:

In [70]: %%timeit
    ...: cols = df.columns.to_numpy()
    ...: df["new_column"] = [', '.join(cols[x]) for x in df.to_numpy().astype(bool)]
    ...: 
128 ms ± 443 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

#for testing are values converted to boolean (else test fail)
In [72]: %timeit df['new_column'] = df.astype(bool).dot(df.columns + ',').str.rstrip(",")
138 ms ± 1.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

#Dishin H Goyani
In [73]: %timeit df["New_column"] = df.apply(lambda x: ','.join(df.columns[x==1]), axis=1)
3.98 s ± 129 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#Akshay Sehgal
In [75]: %timeit df['new_column'] = df.apply(lambda x: ', '.join(list(x[x!=0].index)), axis=1)
11 s ± 349 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#Rajith Thennakoon
In [78]: %%timeit
    ...: df["new_column"] = df.apply(lambda x: (pd.DataFrame(x[x==1]).index.values),axis=1)
    ...: df["new_column"] = df["new_column"].apply(lambda x: ','.join(map(str, x)))
    ...: 
    ...: 
25.9 s ± 709 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 1

Akshay Sehgal
Akshay Sehgal

Reputation: 19322

If you have python >= 3.5, you can use the matmul operator to do a dot product as -

df['new_column'] = (df @ df.columns).str.join(', ')
   A  B  C  D  E new_column
0  0  1  0  1  1    B, D, E
1  0  0  0  0  0           
2  1  0  0  1  1    A, D, E

Or you could use apply to solve this over axis=1 as following -

df['new_column'] = df.apply(lambda x: ', '.join(list(x[x!=0].index)), axis=1)
   A  B  C  D  E new_column
0  0  1  0  1  1    B, D, E
1  0  0  0  0  0           
2  1  0  0  1  1    A, D, E

Upvotes: 0

Dishin H Goyani
Dishin H Goyani

Reputation: 7693

You can use apply with lambda function on axis=1

df["New_column"] = df.apply(lambda x: ','.join(df.columns[x==1]), axis=1)
df
   A  B  C  D  F New_column
0  0  1  0  1  1      B,D,F
1  0  0  0  0  0
2  1  0  0  1  1      A,D,F

Upvotes: 0

Rajith Thennakoon
Rajith Thennakoon

Reputation: 4130

Try this method.

df = pd.DataFrame({"A":[0,0,1],"B":[1,0,0],"C":[0,0,0],"D":[1,0,1],"F":[1,0,1]})

df["new_column"] = df.apply(lambda x: (pd.DataFrame(x[x==1]).index.values),axis=1)
df["new_column"] = df["new_column"].apply(lambda x: ','.join(map(str, x)))

output

   A  B  C  D  F new_column
0  0  1  0  1  1      B,D,F
1  0  0  0  0  0           
2  1  0  0  1  1      A,D,F

Upvotes: 0

footfalcon
footfalcon

Reputation: 619

Not sure if this is the best solution, but it gets the job done:

import pandas as pd

df = pd.DataFrame(
    {
        'A': [0,0,1],
        'B': [1,0,0],
        'C': [0,0,0,],
        'D': [1,0,1],
        'F': [1,0,1]
    }
)

df1 = df.T
new_cells = []
for c in df1.columns:
    new_cells.append(df1[df1[c] == 1].index.tolist())
df['New_column'] = new_cells

Output:

A   B   C   D   F   New_column
0   0   1   0   1   1   [B, D, F]
1   0   0   0   0   0   []
2   1   0   0   1   1   [A, D, F]

Upvotes: 0

Related Questions