Reputation: 648
I want to update the column Value
of the dataframe dateEANdf
with the values of the dataframe valueEANdf
but only those rows with the earlier date.
An excerpt of valueEANdf
looks as follows:
EAN-Unique Value
3324324 3.0
asd2343 2.0
Xjkhfsd 1.2
5234XAR 4.5
3434343 2.6
An excerpt of dateEANdf
looks as follows, it contains each EAN twice with an earlier and a later date.
EAN-Unique Date Start Value
3324324 2018-06-01 yes
3324324 2019-04-30 no
asd2343 2015-03-23 yes
asd2343 2015-07-11 no
Xjkhfsd 1999-04-12 yes
Xjkhfsd 2001-02-01 no
5234XAR 2000-12-13 yes
5234XAR 2013-12-13 no
3434343 1972-05-23 yes
3434343 1980-11-01 no
The updated dateEANdf
should be as follows:
EAN-Unique Date Start Value
3324324 2018-06-01 yes 3.0
3324324 2019-04-30 no
asd2343 2015-03-23 yes 2.0
asd2343 2015-07-11 no
Xjkhfsd 1999-04-12 yes 1.2
Xjkhfsd 2001-02-01 no
5234XAR 2000-12-13 yes 4.5
5234XAR 2013-12-13 no
3434343 1972-05-23 yes 2.6
3434343 1980-11-01 no
My attempt was
dateEANdf.loc[ (dateEANdf['EAN-Unique'].isin(valueEANdf.unique().tolist())) & ( dateEANdf['Start'] == 'yes') , 'Value' ] = valueEANdf['Value']
However, this puts the values "somewhere" like randomly but not to the earlier date. How to fix that?
Thanks.
Upvotes: 1
Views: 70
Reputation: 7832
import pandas as pd
import numpy as np
You can also do:
dateEANdf['Value']=dateEANdf['EAN-Unique'].apply(
lambda row: float(valueEANdf[valueEANdf['EAN-Unique']==row].Value))
which will give you:
EAN-Unique Date Start Value
0 3324324 2018-06-01 yes 3.0
1 3324324 2019-04-30 no 3.0
2 asd2343 2015-03-23 yes 2.0
3 asd2343 2015-07-11 no 2.0
4 Xjkhfsd 1999-04-12 yes 1.2
5 Xjkhfsd 2001-02-01 no 1.2
6 5234XAR 2000-12-13 yes 4.5
7 5234XAR 2013-12-13 no 4.5
8 3434343 1972-05-23 yes 2.6
9 3434343 1980-11-01 no 2.6
Drop every second value in the value, relying on this thread: Pandas every nth row:
dateEANdf.loc[1::2,'Value']=np.nan
which will result in:
EAN-Unique Date Start Value
0 3324324 2018-06-01 yes 3.0
1 3324324 2019-04-30 no NaN
2 asd2343 2015-03-23 yes 2.0
3 asd2343 2015-07-11 no NaN
4 Xjkhfsd 1999-04-12 yes 1.2
5 Xjkhfsd 2001-02-01 no NaN
6 5234XAR 2000-12-13 yes 4.5
7 5234XAR 2013-12-13 no NaN
8 3434343 1972-05-23 yes 2.6
9 3434343 1980-11-01 no NaN
Upvotes: 1
Reputation: 16673
You could do a merge
and then update values with np.where
:
# If 'Value' is not already in 'dateEANdf', then remove `dateEANdf.drop('Value', axis=1)`
dateEANdf = dateEANdf.drop('Value', axis=1).merge(valueEANdf, how='left', on='EAN-Unique')
dateEANdf['Value'] = np.where(dateEANdf['Start'] == 'no', np.nan, dateEANdf['Value'])
dateEANdf
Out[1]:
EAN-Unique Date Start Value
0 3324324 2018-06-01 yes 3.0
1 3324324 2019-04-30 no NaN
2 asd2343 2015-03-23 yes 2.0
3 asd2343 2015-07-11 no NaN
4 Xjkhfsd 1999-04-12 yes 1.2
5 Xjkhfsd 2001-02-01 no NaN
6 5234XAR 2000-12-13 yes 4.5
7 5234XAR 2013-12-13 no NaN
8 3434343 1972-05-23 yes 2.6
9 3434343 1980-11-01 no NaN
Upvotes: 2
Reputation: 150735
Try loc
slicing, then map
:
s = dateEANdf['Start'].eq('yes')
dateEANdf.loc[s, 'Value'] = (dateEANdf.loc[s, 'EAN-Unique']
.map(valueEANdf.set_index('EAN-Unique')['Value'])
)
Or map the whole series then where
:
dateEANdf['Value'] = (dateEANdf['EAN-Unique'].map(valueEANdf.set_index('EAN-Unique')['Value'])
.where(dateEANdf['Start'].eq('yes'))
)
Output:
EAN-Unique Date Start Value
0 3324324 2018-06-01 yes 3.0
1 3324324 2019-04-30 no NaN
2 asd2343 2015-03-23 yes 2.0
3 asd2343 2015-07-11 no NaN
4 Xjkhfsd 1999-04-12 yes 1.2
5 Xjkhfsd 2001-02-01 no NaN
6 5234XAR 2000-12-13 yes 4.5
7 5234XAR 2013-12-13 no NaN
8 3434343 1972-05-23 yes 2.6
9 3434343 1980-11-01 no NaN
Upvotes: 2