Nilakshi Naphade
Nilakshi Naphade

Reputation: 1085

Python KeyError: pandas: match row value to column name/key where some keys are missing

I have DataFrame which looks something like below:

Q5 | Q10 | Q41 | item 
a  | b   | c   |  Q5  
d  | e   | f   |  Q10  
g  | h   | i   |  Q571    
j  | k   | l   |  Q23340  
m  | n   | o   |  Q41  
h  | p   | s   |  Q10  

Where Q5, Q10, Q41, item are column names of the DataFrame. I want to add one more column "name" which will have value of the column where value of column "item" matched with the column name. So I want it to look like as below:

Q5 | Q10 | Q41 | item    | name
a  | b   | c   |  Q5     |  a
d  | e   | f   |  Q10    |  e
g  | h   | i   |  Q571   |  NA
j  | k   | l   |  Q23340 |  NA
m  | n   | o   |  Q41    |  o
h  | p   | s   |  Q10    |  p

The problem here is, there are more items than columns. So not all the values in column item exist as columns which causes keyError. I tried doing like below:

df['col_exist'] = [(col in df.columns) for col in df.item]
df['name'] = np.where(df['col_exist']==True, df[df.item], np.nan)

And I get error as:

KeyError: "['Q571', 'Q23340'] not in index"

I also tried using df.apply as below:

 df['name'] = np.where(df['col_exist']==True, df.apply(lambda x: x[x.item], axis=1), np.nan)

But I am getting error as below:

KeyError: ('Q571', 'occurred at index 2')

I am not sure why it is trying to access the column which does not exist despite of placing col_exit check there.

Can someone please help me to resolve this issue?

Upvotes: 1

Views: 1383

Answers (1)

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

You can filter item column based on columns then use lookup i.e

df['new'] = df['item'].apply(lambda x : x if x in df.columns else np.nan)

or

df['new'] = np.where(df['item'].isin(df.columns), df['item'], np.nan)
df['name'] = np.nan
df['name'] = df.lookup(df.index,df['new'].fillna('name'))

Output:

    Q5    Q10    Q41    item  new   name
0  a     b      c         Q5   Q5    a  
1  d     e      f        Q10  Q10    e   
2  g     h      i       Q571  NaN    NaN
3  j     k      l     Q23340  NaN    NaN
4  m     n      o        Q41  Q41    o   
5  h     p      s        Q10  Q10    p   

To remove new column df = df.drop('new',1)

To make your approach work instead of df[df.item] use df['item']

df['name'] = np.where(df['col_exist']==True, df['item'], np.nan)

Upvotes: 1

Related Questions