Reputation: 447
I have two dataframes (one that has list of all days in a month and the other than has days when a staff marked attendance) and I am trying to perform a left join such that I have a new Dataframe with all dates and with dates when employee did and did not mark attendance.
Given below is how df1
is:
days
01-01-2018
02-01-2018
03-01-2018
04-01-2018
05-01-2018
06-01-2018
07-01-2018
Given below is how df2
is:
date, emp_id
01-01-2018,101
03-01-2018,101
04-01-2018,101
06-01-2018,101
I am trying to create a new Dataframe as below:
date,marked,emp_id
01-01-2018,01-01-2018,101
02-01-2018,02-01-2018,101
03-01-2018,03-01-2018,101
04-01-2018,04-01-2018,101
05-01-2018,05-01-2018,101
06-01-2018,06-01-2018,101
Days when a value exists in df2
, the new Dataframe shall have a valid date if the date exists in df1 and df2 else it should be null. I tried doing the below but I see it returns all dates
new_df = pd.merge(df1, df2, how='left', left_on=['days'], right_on = ['date'])
Upvotes: 0
Views: 88
Reputation: 61910
You could do something like this:
new_df = pd.merge(df1, df2, how='outer', left_on=['days'], right_on = ['date'])
new_df = new_df.fillna({'emp_id': 101.0})
print(new_df)
Output
days date emp_id
0 2018-01-01 2018-01-01 101.0
1 2018-01-02 NaT 101.0
2 2018-01-03 2018-01-03 101.0
3 2018-01-04 2018-01-04 101.0
4 2018-01-05 NaT 101.0
5 2018-01-06 2018-01-06 101.0
6 2018-01-07 NaT 101.0
If you want a sort of indicator column, do this, instead:
new_df = pd.merge(df1, df2, how='outer', left_on=['days'], right_on = ['date']).fillna({'emp_id': 101.0})
new_df['marked'] = (new_df.days == new_df.date).astype(np.uint8)
new_df = new_df.drop('date', axis=1)
print(new_df)
Output
days emp_id marked
0 2018-01-01 101.0 1
1 2018-01-02 101.0 0
2 2018-01-03 101.0 1
3 2018-01-04 101.0 1
4 2018-01-05 101.0 0
5 2018-01-06 101.0 1
6 2018-01-07 101.0 0
Upvotes: 1