John
John

Reputation: 521

dataframe merge with missing data

I have 2 dataframes:

df.head()
Out[2]: 
   Unnamed: 0 Symbol        Date      Close
0        4061      A  2016-01-13  36.515889
1        4062      A  2016-01-14  36.351784
2        4063      A  2016-01-15  36.351784
3        4064      A  2016-01-19  36.590483
4        4065      A  2016-01-20  35.934062

and

dfw.head()
Out[3]: 
         Symbol       Weight
0  A              (0.000002)
1  AA              0.000112 
2  AAC            (0.000004)
3  AAL             0.000006 
4  AAMC            0.000002 

ISSUE: Not every symbol if df will have a weight in dfw. If it does not I want to drop it from my new dataframe (all dates of it). If the symbol is in dfw I want to merge the weight in with df so that each row has symbol, date, close and weight. I have tried the following but get NaN values. I also am not sure how to remove all symbols with no weights even if I was successful.

dfall = df.merge(dfw, on='Symbol', how='left')

dfall.head()
Out[14]: 
   Unnamed: 0 Symbol        Date      Close Weight
0        4061      A  2016-01-13  36.515889    NaN
1        4062      A  2016-01-14  36.351784    NaN
2        4063      A  2016-01-15  36.351784    NaN
3        4064      A  2016-01-19  36.590483    NaN
4        4065      A  2016-01-20  35.934062    NaN

Upvotes: 1

Views: 100

Answers (1)

Alexander
Alexander

Reputation: 109726

df_all = df[df.Symbol.isin(dfw.Symbol.unique())].merge(dfw, how='left', on='Symbol')

I am not sure why you are getting NaN values. Perhaps you have spaces in you your symbols? You can clean them via: dfw['Symbol'] = dfw.Symbol.str.strip() You would need to do the same for df.

>>> df_all
Unnamed: 0 Symbol        Date      Close      Weight
   0  4061      A  2016-01-13  36.515889  (0.000002)
   1  4062      A  2016-01-14  36.351784  (0.000002)
   2  4063      A  2016-01-15  36.351784  (0.000002)
   3  4064      A  2016-01-19  36.590483  (0.000002)
   4  4065      A  2016-01-20  35.934062  (0.000002)

Upvotes: 3

Related Questions