Murtaza Haji
Murtaza Haji

Reputation: 1193

How to pick first occurence of value and get column name

This is my dataframe

     color    string   col_new   string.1    string.2
0    red       abc      qwer       abc         poi
1   blue       xyz      zxcv       123         xyz
2  green       pqr      uyit       tzv         pqr
3   pink       lmn      nbtw       lmn         lmn

For each color(which is unique in my case), I want to take value in string column, and find the first occurrence of that value in the same row and return the column name where I found the value

For eg.

row 0 should return string.1 row 1 should return string.2 row 2 should return string.2 row 3 should return string.1

I found this example but its implementing on a single series or column, I want this to search value across all columns in that row.

Upvotes: 1

Views: 623

Answers (1)

jezrael
jezrael

Reputation: 863291

Select all columns by DataFrame.filter, compare by column string and get columns names by first Trues by DataFrame.idxmax:

print (df.filter(regex='string\.\d'))
  string.1 string.2
0      abc      poi
1      123      xyz
2      tzv      pqr
3      lmn      lmn

s = df.filter(regex='string\.\d').eq(df['string'], axis=0).idxmax(axis=1)

print (s)
0    string.1
1    string.2
2    string.2
3    string.1
dtype: object

If want select all columns without first 2 use DataFrame.iloc:

s = df.iloc[:, 2:].eq(df['string'], axis=0).idxmax(axis=1)
print (s)
0    string.1
1    string.2
2    string.2
3    string.1
dtype: object

If possible some row no match is possible use numpy.where with DataFrame.any for avoid return first False columns:

print (df)
   color string col_new string.1 string.2
0    red    abc    qwer      abc      poi
1   blue    xyz    zxcv      123      xyz
2  green    pqr    uyit      tzv      pqr
3   pink    lmn    nbtw      lmn      lmn
3   pink    lmn    nbtw      ttt      rrr <- no lmn values in another columns


mask = df.filter(regex='string\.\d').eq(df['string'], axis=0)
out = np.where(mask.any(axis=1), mask.idxmax(axis=1), np.nan)
print (out)
['string.1' 'string.2' 'string.2' 'string.1' nan]

Original solution return wrong output:

print (df.filter(regex='string\.\d').eq(df['string'], axis=0).idxmax(axis=1))
0    string.1
1    string.2
2    string.2
3    string.1
3       color
dtype: object

EDIT:

print (df)
   color string col_new string.1 string.2
0    red    abc    qwer  abctyrf      poi
1   blue    xyz    zxcv      123      xyz
2  green    pqr    uyit      tzv      pqr
3   pink    lmn    nbtw      lmn      lmn

#default index
df = df.reset_index(drop=True)
#column for test
st = df['string']
#for eac hrow is tested subtring by st
s = df.filter(regex='string\.\d').apply(lambda x: x.str.contains(st.loc[x.name]), axis=1)
print (s)
   string.1  string.2
0      True     False
1     False      True
2     False      True
3      True      True

Upvotes: 2

Related Questions