Reputation: 1142
this is the first data frame
Umls Snomed
C0027497/Nausea /Sign or Symptom Nausea (finding)[FN/422587007]
C0151786 / Muscle/Sign or Symptom Muscle weakness [(finding) /FN/26544005]
C2127305 /bitter/ Sign or Symptom ?
NA NA
I created a dictionary of it using the following code
df_dic_1= df_dic_1[['UMLS', 'snomed']]
df_dic_1['UMLS'].fillna(0, inplace=True)
df_dic_1['snomed'].fillna(0, inplace=True)
equiv_snomed=df_dic_1.set_index('UMLS')['snomed'].to_dict()
Now, for data frame B:
id symptom UMLS
1 nausea C0027497/Nausea /Sign or Symptom
2 muscle C2127305 /bitter/ Sign or Symptom
3 headache
4 pain
5 bitter C2127305 /bitter/ Sign or Symptom
For any value in "UMLS" column that is available in the dictionary, I want to create another column "Snomed" that includes "snomed" values from the dictionary. So data frame C should be like this:
id symptom UMLS Snomed
1 nausea C0027497/Nausea /Sign or Symptom Nausea (finding)[FN/422]
2 muscle C0151786 / Muscle/Sign or Symptom Muscle [(fi)/FN/25]
3 headache
4 pain
5 bitter C2127305 /bitter/ Sign or Symptom ?
Any help? thanks
Upvotes: 0
Views: 1456
Reputation: 4199
You could use apply
function for each element of your column UMLS and get the value from the dictionary equiv_snomed
. if there is no key in the dictionary, you can return np.nan
if your data frame B is named df2. then
df2['Snomed'] = df2['UMLS'].apply(lambda x: equiv_snomed.get(x, np.nan))
Upvotes: 2
Reputation: 324
See EdChum's answer to this Stack Overflow question.
As applied to your situation, it would look like:
import pandas as pd
# create dictionary
d = {'umls1':'snomed1','umls2':'snomed2','umls3':'snomed3'}
# create empty dataframe
columns = ['symptom','umls','snomed']
df = pd.DataFrame(columns = columns)
# fill it with symptoms and with umls, with some umls NULL
df['symptom'] = ['nausea','muscle','headache','pain','bitter']
df.ix[0,'umls'] = 'umls1'
df.ix[1,'umls'] = 'umls2'
df.ix[4,'umls'] = 'umls3'
# add a third column with snomed values from dictionary
df['snomed'] = df['umls'].map(d)
Giving the following output:
df.head()
Out[21]:
symptom umls snomed
0 nausea umls1 snomed1
1 muscle umls2 snomed2
2 headache NaN NaN
3 pain NaN NaN
4 bitter umls3 snomed3
Upvotes: 2