Sam
Sam

Reputation: 1256

How to check multiple columns of a dataframe for a value and get the column names and the value as lists in new columns?

I am trying to get see if the pandas dataframe columns as values from a list.

    df = pd.DataFrame({'id':[np.nan,2,3,4,5,6],
                      'column1':[np.nan,10,15,20,25,25],
                      'column2':[np.nan,4,6,8,10,np.nan],
                       'column3':[np.nan,6,9,12,np.nan,15],
                      'column4':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]})

    lst = ['4','6','10','25']
    cols = ['column1','column2','column3','column4']

I am trying to check the value of lst in the multiple columns in df and get the column names of the df if value exist in the df.

The result I am looking for is: enter image description here

The result I am getting is in True and False.

I was able to get result but not result1:

I got result by

m1  = df[cols].isin(lst)
m2 = pd.DataFrame((~m1.any(1)).to_numpy()[:, None] &  df[cols].eq(2).to_numpy(),
                       index=m1.index, columns=m1.columns)
    
m = (m1 | m2)
    
df['result'] = m.where(m).stack().reset_index().groupby('level_0')['level_1'].agg(list)

Upvotes: 1

Views: 148

Answers (1)

Laurent
Laurent

Reputation: 13518

Here is one way to do it:

import numpy as np
import pandas as pd

df = pd.DataFrame(
    {
        "id": [np.nan, 2, 3, 4, 5, 6],
        "column1": [np.nan, 10, 15, 20, 25, 25],
        "column2": [np.nan, 4, 6, 8, 10, np.nan],
        "column3": [np.nan, 6, 9, 12, np.nan, 15],
        "column4": [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
    }
)

# Setup
lst = ["4", "6", "10", "15", "25"]
cols = ["column1", "column2", "column3", "column4"]
result = [[] for _ in range(df.shape[0])]
result1 = [[] for _ in range(df.shape[0])]

# Iterate and record results
for col in cols:
    for i, x in enumerate(df[col]):
        if x in df[col].tolist():
            result[i].append(x)
            result1[i].append(col)

# Add results rto dataframe
df = df.assign(result=result, result1=result1)
print(df)
# Output
    id  column1  column2  ...  column4             result                      result1
0  NaN      NaN      NaN  ...      NaN                 []                           []
1  2.0     10.0      4.0  ...      NaN   [10.0, 4.0, 6.0]  [column1, column2, column3]
2  3.0     15.0      6.0  ...      NaN   [15.0, 6.0, 9.0]  [column1, column2, column3]
3  4.0     20.0      8.0  ...      NaN  [20.0, 8.0, 12.0]  [column1, column2, column3]
4  5.0     25.0     10.0  ...      NaN       [25.0, 10.0]           [column1, column2]
5  6.0     25.0      NaN  ...      NaN       [25.0, 15.0]           [column1, column3]

Upvotes: 1

Related Questions