Reputation: 101
I have this dataframe
import pandas as pd
df = pd.DataFrame({'Found':['A','A','A','A','A','B','B','B'],
'Date':['14/10/2021','19/10/2021','29/10/2021','30/09/2021','20/09/2021','20/10/2021','29/10/2021','15/10/2021'],
'LastDayMonth':['29/10/2021','29/10/2021','29/10/2021','30/09/2021','30/09/2021','29/10/2021','29/10/2021','29/10/2021'],
'Mark':[1,2,3,4,3,1,2,3]
})
print(df)
Found Date LastDayMonth Mark
0 A 14/10/2021 29/10/2021 1
1 A 19/10/2021 29/10/2021 2
2 A 29/10/2021 29/10/2021 3
3 A 30/09/2021 30/09/2021 4
4 A 20/09/2021 30/09/2021 3
5 B 20/10/2021 29/10/2021 1
6 B 29/10/2021 29/10/2021 2
7 B 15/10/2021 29/10/2021 3
based on this dataframe I need to create a new column that is the "Mark" of the last day of the month to form this new column.
that is, I need the value of the 'Mark' column of the last day of the month of each Found
Result Example
Found Date LastDayMonth Mark Mark_LastDayMonth
0 A 14/10/2021 29/10/2021 1 3
1 A 19/10/2021 29/10/2021 2 3
2 A 29/10/2021 29/10/2021 3 3
3 A 30/09/2021 30/09/2021 4 4
4 A 20/09/2021 30/09/2021 3 4
5 B 20/10/2021 29/10/2021 1 2
6 B 29/10/2021 29/10/2021 2 2
7 B 15/10/2021 29/10/2021 3 2
how could I do that? a function? I need to do this for a large amount of data
Upvotes: 1
Views: 88
Reputation: 365
IIUC, for each Found
, you check which is the Mark
value at the last day of the month and you assign that value to Mark_LastDayMonth
.
You can proceed the following:
# 1. Select last days
mark_last_day = df.loc[df.apply(lambda x: x['Date']==x['LastDayMonth'], 1)]
# 2. Merge them to the original dataset, renaming the right Mark value to Mark_LastDayMonth
df.merge(mark_last_day[['Found', 'LastDayMonth', 'Mark']],
how='left',
on=['Found', 'LastDayMonth'],
suffixes=('', '_LastDayMonth'))
# Output
Found Date LastDayMonth Mark Mark_LastDayMonth
0 A 14/10/2021 29/10/2021 1 3
1 A 19/10/2021 29/10/2021 2 3
2 A 29/10/2021 29/10/2021 3 3
3 A 30/09/2021 30/09/2021 4 4
4 A 20/09/2021 30/09/2021 3 4
5 B 20/10/2021 29/10/2021 1 2
6 B 29/10/2021 29/10/2021 2 2
7 B 15/10/2021 29/10/2021 3 2
The first step, that is this line of code df.loc[df.apply(lambda x: x['Date']==x['LastDayMonth'], 1)]
will produce the following:
Found Date LastDayMonth Mark
2 A 29/10/2021 29/10/2021 3
3 A 30/09/2021 30/09/2021 4
6 B 29/10/2021 29/10/2021 2
This output is then merged to the orginal dataframe df
Upvotes: 1