HelperPop
HelperPop

Reputation: 51

create pandas column based on condition on row values and other column grouped

I have this dataframe and trying to find the time gap between first hire and either one terminated or shift change happened earlier when dates exists or null if conditions be true calculate for each Employee ID

Condition 1 : first hire happened at Q4

Condition 2 : if first hire happened after 2021 and the result appear at the TimeGap column .

df = pd.DataFrame.from_dict({'Employee No': [102, 103, 102, 102, 101, 103, 101, 105, 106, 102, 107, 108, 109, 109, 109, 109], 'Event date': ['2020-10-12', '2021-11-02', '2022-01-01', '2021-12-12', '2021-12-03', '2021-11-05', '2021-12-04', '2022-02-26', '2022-02-26', '2022-03-29', '2021-05-04', '2022-04-04', '2022-03-03', '2021-12-29', '2022-04-01', '2022-01-10'], 'EventDescription': ['First Hire', 'First Hire', 'Terminated', 'Shift Change', 'First Hire', 'Terminated', 'Terminated', 'First Hire', 'First Hire', 'Second Hire', 'First Hire', 'First Hire', 'Terminated', 'First Hire', 'Second Hire', 'Shift Change'], 'Quarter': ['Q4', 'Q4', 'Q1', 'Q4', 'Q4', 'Q4', 'Q4', 'Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q1', 'Q4', 'Q2', 'Q1']})

    Employee No  Event date EventDescription Quarter
0           102  2020-10-12       First Hire      Q4
1           103  2021-11-02       First Hire      Q4
2           102  2022-01-01       Terminated      Q1
3           102  2021-12-12     Shift Change      Q4
4           101  2021-12-03       First Hire      Q4
5           103  2021-11-05       Terminated      Q4
6           101  2021-12-04       Terminated      Q4
7           105  2022-02-26       First Hire      Q1
8           106  2022-02-26       First Hire      Q1
9           102  2022-03-29      Second Hire      Q1
10          107  2021-05-04       First Hire      Q2
11          108  2022-04-04       First Hire      Q2
12          109  2022-03-03       Terminated      Q1
13          109  2021-12-29       First Hire      Q4
14          109  2022-04-01      Second Hire      Q2
15          109  2022-01-10     Shift Change      Q1

Result should be something like this :

index Employee_No Event_date EventDescription Quarter Timegap
0 102 2020-10-12 First Hire Q4 NaN
1 103 2021-11-02 First Hire Q4 NaN
2 102 2022-01-01 Terminated Q1 NaN
3 102 2021-12-12 Shift Change Q4 NaN
4 101 2021-12-03 First Hire Q4 NaN
5 103 2021-11-05 Terminated Q4 3.0
6 101 2021-12-04 Terminated Q4 1.0
7 105 2022-02-26 First Hire Q1 NaN
8 106 2022-02-26 First Hire Q1 NaN
9 102 2022-03-29 Second Hire Q1 NaN
10 107 2021-05-04 First Hire Q2 NaN
11 108 2022-04-04 First Hire Q2 NaN
12 109 2022-03-03 Terminated Q1 NaN
13 109 2021-12-29 First Hire Q4 NaN
14 109 2022-04-01 Second Hire Q2 NaN
15 109 2022-01-10 Shift Change Q1 12.0

Upvotes: 0

Views: 42

Answers (1)

brendon
brendon

Reputation: 196

You can create two dataframe representing first hire and first stop, and then merge them.

gap_df = pd.DataFrame.from_dict({'Employee No': [102, 103, 102, 102, 101, 103, 101, 105, 106, 102, 107, 108, 109, 109, 109, 109], 'Event date': ['2020-10-12', '2021-11-02', '2022-01-01', '2021-12-12', '2021-12-03', '2021-11-05', '2021-12-04', '2022-02-26', '2022-02-26', '2022-03-29', '2021-05-04', '2022-04-04', '2022-03-03', '2021-12-29', '2022-04-01', '2022-01-10'], 'EventDescription': [
                                'First Hire', 'First Hire', 'Terminated', 'Shift Change', 'First Hire', 'Terminated', 'Terminated', 'First Hire', 'First Hire', 'Second Hire', 'First Hire', 'First Hire', 'Terminated', 'First Hire', 'Second Hire', 'Shift Change'], 'Quarter': ['Q4', 'Q4', 'Q1', 'Q4', 'Q4', 'Q4', 'Q4', 'Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q1', 'Q4', 'Q2', 'Q1']})

gap_df["Event date"] = pd.to_datetime(gap_df["Event date"])
gap_df = gap_df.sort_values(["Employee No", "Event date"])

q4_first_hire_df = gap_df.query("Quarter == 'Q4' and EventDescription == 'First Hire'").rename({
    "Event date": "first_hire_date"}, axis=1)
q4_first_hire_df = q4_first_hire_df[q4_first_hire_df["first_hire_date"].map(
    lambda x:x.year) >= 2021]
first_stop_df = gap_df.query("EventDescription in ('Terminated','Shift Change')").groupby(
    "Employee No")["Event date"].min().reset_index().rename({"Event date": "first_stop_date"}, axis=1)

combine_df = pd.merge(q4_first_hire_df, first_stop_df, on="Employee No")
combine_df["Timegap"] = [
    x.days for x in combine_df["first_stop_date"] - combine_df["first_hire_date"]]

pd.merge(gap_df, combine_df[["Employee No", "Timegap", 'first_stop_date']],
         left_on=["Employee No", "Event date"],
         right_on=["Employee No", "first_stop_date"],
         how="left").drop("first_stop_date", axis=1)

Upvotes: 1

Related Questions