Reputation: 1193
This is my dataframe
color string col_new string.1 string.2
0 red abc qwer abc poi
1 blue xyz zxcv 123 xyz
2 green pqr uyit tzv pqr
3 pink lmn nbtw lmn lmn
For each color(which is unique in my case), I want to take value in string
column, and find the first occurrence of that value in the same row and return the column name where I found the value
For eg.
row 0 should return string.1
row 1 should return string.2
row 2 should return string.2
row 3 should return string.1
I found this example but its implementing on a single series or column, I want this to search value across all columns in that row.
Upvotes: 1
Views: 623
Reputation: 863291
Select all columns by DataFrame.filter
, compare by column string
and get columns names by first True
s by DataFrame.idxmax
:
print (df.filter(regex='string\.\d'))
string.1 string.2
0 abc poi
1 123 xyz
2 tzv pqr
3 lmn lmn
s = df.filter(regex='string\.\d').eq(df['string'], axis=0).idxmax(axis=1)
print (s)
0 string.1
1 string.2
2 string.2
3 string.1
dtype: object
If want select all columns without first 2 use DataFrame.iloc
:
s = df.iloc[:, 2:].eq(df['string'], axis=0).idxmax(axis=1)
print (s)
0 string.1
1 string.2
2 string.2
3 string.1
dtype: object
If possible some row no match is possible use numpy.where
with DataFrame.any
for avoid return first False
columns:
print (df)
color string col_new string.1 string.2
0 red abc qwer abc poi
1 blue xyz zxcv 123 xyz
2 green pqr uyit tzv pqr
3 pink lmn nbtw lmn lmn
3 pink lmn nbtw ttt rrr <- no lmn values in another columns
mask = df.filter(regex='string\.\d').eq(df['string'], axis=0)
out = np.where(mask.any(axis=1), mask.idxmax(axis=1), np.nan)
print (out)
['string.1' 'string.2' 'string.2' 'string.1' nan]
Original solution return wrong output:
print (df.filter(regex='string\.\d').eq(df['string'], axis=0).idxmax(axis=1))
0 string.1
1 string.2
2 string.2
3 string.1
3 color
dtype: object
EDIT:
print (df)
color string col_new string.1 string.2
0 red abc qwer abctyrf poi
1 blue xyz zxcv 123 xyz
2 green pqr uyit tzv pqr
3 pink lmn nbtw lmn lmn
#default index
df = df.reset_index(drop=True)
#column for test
st = df['string']
#for eac hrow is tested subtring by st
s = df.filter(regex='string\.\d').apply(lambda x: x.str.contains(st.loc[x.name]), axis=1)
print (s)
string.1 string.2
0 True False
1 False True
2 False True
3 True True
Upvotes: 2