Reputation: 1293
I have a Dataframe with list of unique working dates in a month, called days.
I have another Dataframe that has the list of days an employee has reported to work, called reported.
I am trying to find days when an employee did not report to work.
Given below is sample view of the Dataframes:
days
11/1/2018
11/2/2018
11/3/2018
11/4/2018
11/5/2018
11/6/2018
11/7/2018
View of the Dataframe called reported:
emp_id, date
1001,11/1/2018
1001,11/2/2018
1002,11/1/2018
1002,11/3/2018
1003,11/3/2018
Expected output (that lists down days an employee did not report):
emp_id,date
1001,11/3/2018
1002,11/2/2018
1003,11/1/2018
1003,11/2/2018
Update I created the first Dataframe as below
import numpy as np
df_1 = np.sort(df_master.date.unique())
df_1_df = pd.DataFrame(df_1)
df_1_df.columns
RangeIndex(start=0, stop=1, step=1)
Upvotes: 1
Views: 747
Reputation: 862681
You need merge
with default inner join and then reindex
by all unique values of emp_id
and date
, last merge
with parameter indicator=True
for filtering not reported rows:
reported['date'] = pd.to_datetime(reported['date'])
df1 = reported.set_index(['emp_id', 'date'])
mux = pd.MultiIndex.from_product(df1.index.levels, names=df1.index.names)
df3 = (df1.reindex(mux)
.reset_index()
.merge(reported, indicator=True, how='outer')
.query('_merge != "both"')
.drop('_merge', axis=1))
print (df3)
emp_id date
2 1001 2018-11-03
4 1002 2018-11-02
6 1003 2018-11-01
7 1003 2018-11-02
Another solution, thanks @IMCoins is filtering by isin
with ~
for invert boolean mask:
df2 = df1.reindex(mux)
df3 = df2[~df2.index.isin(df1.index)].reset_index()
print (df3)
emp_id date
0 1001 2018-11-03
1 1002 2018-11-02
2 1003 2018-11-01
3 1003 2018-11-02
Upvotes: 2