Reputation: 51
I have this dataframe and trying to find the time gap between first hire and either one terminated or shift change happened earlier when dates exists or null if conditions be true calculate for each Employee ID
Condition 1 : first hire happened at Q4
Condition 2 : if first hire happened after 2021 and the result appear at the TimeGap column .
df = pd.DataFrame.from_dict({'Employee No': [102, 103, 102, 102, 101, 103, 101, 105, 106, 102, 107, 108, 109, 109, 109, 109], 'Event date': ['2020-10-12', '2021-11-02', '2022-01-01', '2021-12-12', '2021-12-03', '2021-11-05', '2021-12-04', '2022-02-26', '2022-02-26', '2022-03-29', '2021-05-04', '2022-04-04', '2022-03-03', '2021-12-29', '2022-04-01', '2022-01-10'], 'EventDescription': ['First Hire', 'First Hire', 'Terminated', 'Shift Change', 'First Hire', 'Terminated', 'Terminated', 'First Hire', 'First Hire', 'Second Hire', 'First Hire', 'First Hire', 'Terminated', 'First Hire', 'Second Hire', 'Shift Change'], 'Quarter': ['Q4', 'Q4', 'Q1', 'Q4', 'Q4', 'Q4', 'Q4', 'Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q1', 'Q4', 'Q2', 'Q1']})
Employee No Event date EventDescription Quarter
0 102 2020-10-12 First Hire Q4
1 103 2021-11-02 First Hire Q4
2 102 2022-01-01 Terminated Q1
3 102 2021-12-12 Shift Change Q4
4 101 2021-12-03 First Hire Q4
5 103 2021-11-05 Terminated Q4
6 101 2021-12-04 Terminated Q4
7 105 2022-02-26 First Hire Q1
8 106 2022-02-26 First Hire Q1
9 102 2022-03-29 Second Hire Q1
10 107 2021-05-04 First Hire Q2
11 108 2022-04-04 First Hire Q2
12 109 2022-03-03 Terminated Q1
13 109 2021-12-29 First Hire Q4
14 109 2022-04-01 Second Hire Q2
15 109 2022-01-10 Shift Change Q1
Result should be something like this :
index | Employee_No | Event_date | EventDescription | Quarter | Timegap |
---|---|---|---|---|---|
0 | 102 | 2020-10-12 | First Hire | Q4 | NaN |
1 | 103 | 2021-11-02 | First Hire | Q4 | NaN |
2 | 102 | 2022-01-01 | Terminated | Q1 | NaN |
3 | 102 | 2021-12-12 | Shift Change | Q4 | NaN |
4 | 101 | 2021-12-03 | First Hire | Q4 | NaN |
5 | 103 | 2021-11-05 | Terminated | Q4 | 3.0 |
6 | 101 | 2021-12-04 | Terminated | Q4 | 1.0 |
7 | 105 | 2022-02-26 | First Hire | Q1 | NaN |
8 | 106 | 2022-02-26 | First Hire | Q1 | NaN |
9 | 102 | 2022-03-29 | Second Hire | Q1 | NaN |
10 | 107 | 2021-05-04 | First Hire | Q2 | NaN |
11 | 108 | 2022-04-04 | First Hire | Q2 | NaN |
12 | 109 | 2022-03-03 | Terminated | Q1 | NaN |
13 | 109 | 2021-12-29 | First Hire | Q4 | NaN |
14 | 109 | 2022-04-01 | Second Hire | Q2 | NaN |
15 | 109 | 2022-01-10 | Shift Change | Q1 | 12.0 |
Upvotes: 0
Views: 42
Reputation: 196
You can create two dataframe representing first hire and first stop, and then merge them.
gap_df = pd.DataFrame.from_dict({'Employee No': [102, 103, 102, 102, 101, 103, 101, 105, 106, 102, 107, 108, 109, 109, 109, 109], 'Event date': ['2020-10-12', '2021-11-02', '2022-01-01', '2021-12-12', '2021-12-03', '2021-11-05', '2021-12-04', '2022-02-26', '2022-02-26', '2022-03-29', '2021-05-04', '2022-04-04', '2022-03-03', '2021-12-29', '2022-04-01', '2022-01-10'], 'EventDescription': [
'First Hire', 'First Hire', 'Terminated', 'Shift Change', 'First Hire', 'Terminated', 'Terminated', 'First Hire', 'First Hire', 'Second Hire', 'First Hire', 'First Hire', 'Terminated', 'First Hire', 'Second Hire', 'Shift Change'], 'Quarter': ['Q4', 'Q4', 'Q1', 'Q4', 'Q4', 'Q4', 'Q4', 'Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q1', 'Q4', 'Q2', 'Q1']})
gap_df["Event date"] = pd.to_datetime(gap_df["Event date"])
gap_df = gap_df.sort_values(["Employee No", "Event date"])
q4_first_hire_df = gap_df.query("Quarter == 'Q4' and EventDescription == 'First Hire'").rename({
"Event date": "first_hire_date"}, axis=1)
q4_first_hire_df = q4_first_hire_df[q4_first_hire_df["first_hire_date"].map(
lambda x:x.year) >= 2021]
first_stop_df = gap_df.query("EventDescription in ('Terminated','Shift Change')").groupby(
"Employee No")["Event date"].min().reset_index().rename({"Event date": "first_stop_date"}, axis=1)
combine_df = pd.merge(q4_first_hire_df, first_stop_df, on="Employee No")
combine_df["Timegap"] = [
x.days for x in combine_df["first_stop_date"] - combine_df["first_hire_date"]]
pd.merge(gap_df, combine_df[["Employee No", "Timegap", 'first_stop_date']],
left_on=["Employee No", "Event date"],
right_on=["Employee No", "first_stop_date"],
how="left").drop("first_stop_date", axis=1)
Upvotes: 1