Eric Alves
Eric Alves

Reputation: 101

create new column based on other columns

I have this dataframe

import pandas as pd

df = pd.DataFrame({'Found':['A','A','A','A','A','B','B','B'],
               'Date':['14/10/2021','19/10/2021','29/10/2021','30/09/2021','20/09/2021','20/10/2021','29/10/2021','15/10/2021'],
               'LastDayMonth':['29/10/2021','29/10/2021','29/10/2021','30/09/2021','30/09/2021','29/10/2021','29/10/2021','29/10/2021'],
               'Mark':[1,2,3,4,3,1,2,3]

              })
print(df)

Found        Date LastDayMonth  Mark
0     A  14/10/2021   29/10/2021     1
1     A  19/10/2021   29/10/2021     2
2     A  29/10/2021   29/10/2021     3
3     A  30/09/2021   30/09/2021     4
4     A  20/09/2021   30/09/2021     3
5     B  20/10/2021   29/10/2021     1
6     B  29/10/2021   29/10/2021     2
7     B  15/10/2021   29/10/2021     3

based on this dataframe I need to create a new column that is the "Mark" of the last day of the month to form this new column.

that is, I need the value of the 'Mark' column of the last day of the month of each Found

Result Example

Found        Date LastDayMonth  Mark  Mark_LastDayMonth
0     A  14/10/2021   29/10/2021     1                  3
1     A  19/10/2021   29/10/2021     2                  3
2     A  29/10/2021   29/10/2021     3                  3
3     A  30/09/2021   30/09/2021     4                  4
4     A  20/09/2021   30/09/2021     3                  4
5     B  20/10/2021   29/10/2021     1                  2
6     B  29/10/2021   29/10/2021     2                  2
7     B  15/10/2021   29/10/2021     3                  2

how could I do that? a function? I need to do this for a large amount of data

Upvotes: 1

Views: 88

Answers (1)

Jacopo Repossi
Jacopo Repossi

Reputation: 365

IIUC, for each Found, you check which is the Mark value at the last day of the month and you assign that value to Mark_LastDayMonth.

You can proceed the following:

# 1. Select last days
mark_last_day = df.loc[df.apply(lambda x: x['Date']==x['LastDayMonth'], 1)]

# 2. Merge them to the original dataset, renaming the right Mark value to Mark_LastDayMonth
df.merge(mark_last_day[['Found', 'LastDayMonth', 'Mark']],
     how='left',
     on=['Found', 'LastDayMonth'],
     suffixes=('', '_LastDayMonth'))

# Output
Found   Date    LastDayMonth    Mark    Mark_LastDayMonth
0   A   14/10/2021  29/10/2021  1       3
1   A   19/10/2021  29/10/2021  2       3
2   A   29/10/2021  29/10/2021  3       3
3   A   30/09/2021  30/09/2021  4       4
4   A   20/09/2021  30/09/2021  3       4 
5   B   20/10/2021  29/10/2021  1       2
6   B   29/10/2021  29/10/2021  2       2
7   B   15/10/2021  29/10/2021  3       2

The first step, that is this line of code df.loc[df.apply(lambda x: x['Date']==x['LastDayMonth'], 1)] will produce the following:

    Found   Date    LastDayMonth    Mark
2   A   29/10/2021  29/10/2021      3
3   A   30/09/2021  30/09/2021      4
6   B   29/10/2021  29/10/2021      2

This output is then merged to the orginal dataframe df

Upvotes: 1

Related Questions