Reputation: 1085
I have DataFrame which looks something like below:
Q5 | Q10 | Q41 | item
a | b | c | Q5
d | e | f | Q10
g | h | i | Q571
j | k | l | Q23340
m | n | o | Q41
h | p | s | Q10
Where Q5, Q10, Q41, item are column names of the DataFrame. I want to add one more column "name" which will have value of the column where value of column "item" matched with the column name. So I want it to look like as below:
Q5 | Q10 | Q41 | item | name
a | b | c | Q5 | a
d | e | f | Q10 | e
g | h | i | Q571 | NA
j | k | l | Q23340 | NA
m | n | o | Q41 | o
h | p | s | Q10 | p
The problem here is, there are more items than columns. So not all the values in column item exist as columns which causes keyError. I tried doing like below:
df['col_exist'] = [(col in df.columns) for col in df.item]
df['name'] = np.where(df['col_exist']==True, df[df.item], np.nan)
And I get error as:
KeyError: "['Q571', 'Q23340'] not in index"
I also tried using df.apply as below:
df['name'] = np.where(df['col_exist']==True, df.apply(lambda x: x[x.item], axis=1), np.nan)
But I am getting error as below:
KeyError: ('Q571', 'occurred at index 2')
I am not sure why it is trying to access the column which does not exist despite of placing col_exit check there.
Can someone please help me to resolve this issue?
Upvotes: 1
Views: 1383
Reputation: 30605
You can filter item column based on columns then use lookup i.e
df['new'] = df['item'].apply(lambda x : x if x in df.columns else np.nan)
or
df['new'] = np.where(df['item'].isin(df.columns), df['item'], np.nan)
df['name'] = np.nan
df['name'] = df.lookup(df.index,df['new'].fillna('name'))
Output:
Q5 Q10 Q41 item new name 0 a b c Q5 Q5 a 1 d e f Q10 Q10 e 2 g h i Q571 NaN NaN 3 j k l Q23340 NaN NaN 4 m n o Q41 Q41 o 5 h p s Q10 Q10 p
To remove new column df = df.drop('new',1)
To make your approach work instead of df[df.item]
use df['item']
df['name'] = np.where(df['col_exist']==True, df['item'], np.nan)
Upvotes: 1