bcal
bcal

Reputation: 55

Python Pandas - Add values from one dataframe to another by matching labels to columns

I know there is an easy, elegant solution to this problem but I am struggling to find it. All I'm trying to do is add a third column to df2 with the corresponding values from df2, based on Date and PN. There may be values in df2 that don't match to df1, and vice versa (fill NaN where there is no match).

df1:

       2017-11-01  2017-11-02  2017-11-03
PN                                                                              
90020       105.0       105.0       105.0
90022       100.0       100.0       100.0 
90061        -3.0        -3.0        -3.0 
90065        30.0        30.0        30.0
90099         2.0         2.0         2.0

df2:

     PN       Date
4   90020 2017-11-01
9   90020 2017-11-02
12  90061 2017-11-01
13  90065 2017-11-02
17  94008 2017-11-03

Desired result:

     PN       Date      Value
4   90020 2017-11-01    105.0
9   90020 2017-11-02    105.0
12  90061 2017-11-01    -3.0
13  90065 2017-11-02    30.0
17  94008 2017-11-03    NaN

Upvotes: 5

Views: 902

Answers (5)

ThisGuyCantEven
ThisGuyCantEven

Reputation: 1267

Here we go. This solution is not listed above (probably because it is hardly readable and involves looping in python which can degrade performance):

def cnvt (i):
    try:
        return i[0]
    except(IndexError):
        return float('nan')

df3 = df2.assign(Value=list(map(cnvt,[
           df1[i][df1["PN"]==j].values 
           for i in list(df1.columns.values[1:]) 
           for j in df2["PN"]][0:len(df2["PN"]-1)]
                 )))

which creates df3:

          Date     PN  Value
4   2017-11-01  90020  105.0
9   2017-11-02  90020  105.0
12  2017-11-01  90061   -3.0
13  2017-11-02  90065   30.0
17  2017-11-03  94008    NaN

Upvotes: 2

Anton vBR
Anton vBR

Reputation: 18906

Another solution would be to create a map of df1 using stack() and then apply it to the zip of the columns.

m = df1.stack()
df2['value'] = pd.Series(list(zip(df2.PN, df2.Date))).map(m).values

#          PN        Date  value
#index                          
#4      90020  2017-11-01  105.0
#9      90020  2017-11-02  105.0
#12     90061  2017-11-01   -3.0
#13     90065  2017-11-02   30.0
#17     94008  2017-11-03    NaN

Time comparison:

jpp's melt and merge: 100 loops, best of 3: 4.41 ms per loop

avbr's stack and map: 100 loops, best of 3: 2.97 ms per loop

ScottB's stack, reindex: 100 loops, best of 3: 3.68 ms per loop

ThisGuy's function,listcomprehension: 100 loops, best of 3: 5.79 ms per loop


Full example:

import pandas as pd

data1 = '''\
PN         2017-11-01  2017-11-02  2017-11-03
90020       105.0       105.0       105.0
90022       100.0       100.0       100.0
90061        -3.0        -3.0        -3.0 
90065        30.0        30.0        30.0
90099         2.0         2.0         2.0'''

data2 = '''\
index     PN       Date
4   90020 2017-11-01
9   90020 2017-11-02
12  90061 2017-11-01
13  90065 2017-11-02
17  94008 2017-11-03
'''

df1 = pd.read_csv(pd.compat.StringIO(data1), sep='\s+').set_index('PN')
df2 = pd.read_csv(pd.compat.StringIO(data2), sep='\s+').set_index('index')

m = df1.stack()
df2['value'] = pd.Series(list(zip(df2.PN, df2.Date))).map(m).values

Upvotes: 3

BENY
BENY

Reputation: 323226

I am thinking about lookup, but it it fail since

KeyError: 'One or more row labels was not found'

#df1.lookup(df2.PN,df2.Date,na_action='ignore')

result = []
for x,y in zip(df2.PN, df2.Date):
    try:
        result.append(df1.get_value(x, y))
    except:
        result.append(np.nan)



df2['V']=result

Upvotes: 0

Scott Boston
Scott Boston

Reputation: 153460

You can do it this way also.

(df1.stack()
   .rename_axis(['PN','Date'])
   .reindex([df2.PN,df2['Date']])
   .reset_index(name='Value')
   .set_axis(df2.index, axis=0, inplace=False))

@AntonVBR suggest using:

(df1.stack()
   .rename_axis(['PN','Date'])
   .reindex([df2.PN,df2['Date']])
   .reset_index(name='Value')
   .set_index(df2.index))

Output:

       PN        Date  Value
4   90020  2017-11-01  105.0
9   90020  2017-11-02  105.0
12  90061  2017-11-01   -3.0
13  90065  2017-11-02   30.0
17  94008  2017-11-03    NaN

Upvotes: 2

jpp
jpp

Reputation: 164623

As mentioned, you can use pd.melt combined with pd.merge.

Just remember to reset_index to promote your index to a column, and then to align column names.

df1 = pd.melt(df.reset_index(), id_vars=df.index.name)\
        .rename(columns={'variable': 'Date'})

res = pd.merge(df2, df1, how='left')

#       PN        Date  value
# 0  90020  2017-11-01  105.0
# 1  90020  2017-11-02  105.0
# 2  90061  2017-11-01   -3.0
# 3  90065  2017-11-02   30.0
# 4  94008  2017-11-03    NaN

To maintain df2 index:

res = df2.reset_index()\
         .merge(df1, how='left')\
         .set_index('index')

#           PN        Date  value
# index                          
# 4      90020  2017-11-01  105.0
# 9      90020  2017-11-02  105.0
# 12     90061  2017-11-01   -3.0
# 13     90065  2017-11-02   30.0
# 17     94008  2017-11-03    NaN

Upvotes: 4

Related Questions