Ferit
Ferit

Reputation: 648

Update values with earlier date based on list by another dataframe in Python

I want to update the column Value of the dataframe dateEANdf with the values of the dataframe valueEANdf but only those rows with the earlier date.

An excerpt of valueEANdf looks as follows:

EAN-Unique  Value
3324324   3.0
asd2343   2.0
Xjkhfsd   1.2
5234XAR   4.5
3434343   2.6

An excerpt of dateEANdf looks as follows, it contains each EAN twice with an earlier and a later date.

EAN-Unique  Date         Start  Value 
3324324     2018-06-01   yes
3324324     2019-04-30   no
asd2343     2015-03-23   yes
asd2343     2015-07-11   no 
Xjkhfsd     1999-04-12   yes 
Xjkhfsd     2001-02-01   no 
5234XAR     2000-12-13   yes
5234XAR     2013-12-13   no
3434343     1972-05-23   yes   
3434343     1980-11-01   no 

The updated dateEANdf should be as follows:

EAN-Unique  Date         Start  Value 
3324324     2018-06-01   yes    3.0
3324324     2019-04-30   no
asd2343     2015-03-23   yes    2.0
asd2343     2015-07-11   no 
Xjkhfsd     1999-04-12   yes    1.2
Xjkhfsd     2001-02-01   no 
5234XAR     2000-12-13   yes    4.5
5234XAR     2013-12-13   no
3434343     1972-05-23   yes    2.6
3434343     1980-11-01   no 

My attempt was

dateEANdf.loc[ (dateEANdf['EAN-Unique'].isin(valueEANdf.unique().tolist())) & ( dateEANdf['Start'] == 'yes') , 'Value' ]  = valueEANdf['Value']

However, this puts the values "somewhere" like randomly but not to the earlier date. How to fix that?

Thanks.

Upvotes: 1

Views: 70

Answers (3)

zabop
zabop

Reputation: 7832

import pandas as pd
import numpy as np

You can also do:

dateEANdf['Value']=dateEANdf['EAN-Unique'].apply(
    lambda row: float(valueEANdf[valueEANdf['EAN-Unique']==row].Value))

which will give you:

  EAN-Unique          Date Start  Value
0    3324324    2018-06-01   yes    3.0
1    3324324    2019-04-30    no    3.0
2    asd2343    2015-03-23   yes    2.0
3    asd2343    2015-07-11    no    2.0
4    Xjkhfsd    1999-04-12   yes    1.2
5    Xjkhfsd    2001-02-01    no    1.2
6    5234XAR    2000-12-13   yes    4.5
7    5234XAR    2013-12-13    no    4.5
8    3434343    1972-05-23   yes    2.6
9    3434343    1980-11-01    no    2.6

Drop every second value in the value, relying on this thread: Pandas every nth row:

dateEANdf.loc[1::2,'Value']=np.nan

which will result in:

  EAN-Unique          Date Start  Value
0    3324324    2018-06-01   yes    3.0
1    3324324    2019-04-30    no    NaN
2    asd2343    2015-03-23   yes    2.0
3    asd2343    2015-07-11    no    NaN
4    Xjkhfsd    1999-04-12   yes    1.2
5    Xjkhfsd    2001-02-01    no    NaN
6    5234XAR    2000-12-13   yes    4.5
7    5234XAR    2013-12-13    no    NaN
8    3434343    1972-05-23   yes    2.6
9    3434343    1980-11-01    no    NaN

Upvotes: 1

David Erickson
David Erickson

Reputation: 16673

You could do a merge and then update values with np.where:

# If 'Value' is not already in 'dateEANdf', then remove `dateEANdf.drop('Value', axis=1)`
dateEANdf = dateEANdf.drop('Value', axis=1).merge(valueEANdf, how='left', on='EAN-Unique')
dateEANdf['Value'] = np.where(dateEANdf['Start'] == 'no', np.nan, dateEANdf['Value'])
dateEANdf
Out[1]: 
  EAN-Unique        Date Start  Value
0    3324324  2018-06-01   yes    3.0
1    3324324  2019-04-30    no    NaN
2    asd2343  2015-03-23   yes    2.0
3    asd2343  2015-07-11    no    NaN
4    Xjkhfsd  1999-04-12   yes    1.2
5    Xjkhfsd  2001-02-01    no    NaN
6    5234XAR  2000-12-13   yes    4.5
7    5234XAR  2013-12-13    no    NaN
8    3434343  1972-05-23   yes    2.6
9    3434343  1980-11-01    no    NaN

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150735

Try loc slicing, then map:

s = dateEANdf['Start'].eq('yes')
dateEANdf.loc[s, 'Value'] = (dateEANdf.loc[s, 'EAN-Unique']
                                 .map(valueEANdf.set_index('EAN-Unique')['Value'])
                            )

Or map the whole series then where:

dateEANdf['Value'] = (dateEANdf['EAN-Unique'].map(valueEANdf.set_index('EAN-Unique')['Value'])
                          .where(dateEANdf['Start'].eq('yes'))
                     )

Output:

  EAN-Unique        Date Start  Value
0    3324324  2018-06-01   yes    3.0
1    3324324  2019-04-30    no    NaN
2    asd2343  2015-03-23   yes    2.0
3    asd2343  2015-07-11    no    NaN
4    Xjkhfsd  1999-04-12   yes    1.2
5    Xjkhfsd  2001-02-01    no    NaN
6    5234XAR  2000-12-13   yes    4.5
7    5234XAR  2013-12-13    no    NaN
8    3434343  1972-05-23   yes    2.6
9    3434343  1980-11-01    no    NaN

Upvotes: 2

Related Questions