Reputation: 117
I have the following dataset
trade_id start_date end_date factset_entity_id product_id product_name l6_id multi_assign_flag
0 100000191 2017 NaT 0711GY-E 99996362 Fan Milk - FanYogo 5.015152e+11 0.0
1 100000193 2017 2018 0711GY-E 99996413 Fan Milk - FanDango/Frozen FanDango 5.015151e+11 0.0
2 100000193 2018 2022 0711GY-E 99996413 Fan Milk - FanDango 5.015151e+11 0.0
3 100000193 2022 NaT 0711GY-E 99996413 Fan Milk - FanDango 5.015151e+11 0.0
4 100000231 2016 2018 05NC3S-E 59979529 Ci:z Holdings - Dr. Ci:Labo 5.020102e+11 1.0
My goal is to duplicate the rows in which start_date and end_date are different, for instance
1 100000193 2017 2018 0711GY-E 99996413 Fan Milk - FanDango/Frozen FanDango 5.015151e+11 0.0
should be
2017 100000193 0711GY-E 99996413 Fan Milk - FanDango/Frozen FanDango 5.015151e+11 0.0
2018 100000193 0711GY-E 99996413 Fan Milk - FanDango/Frozen FanDango 5.015151e+11 0.0
Do you have any idea about what could I do here without using loops? Thanks
I try to do it using simple loops and it's fine, but it's slow. I wonder if there are shortcuts in pandas that ease it.
Upvotes: 1
Views: 41
Reputation: 13458
With the following toy dataframe in which rows 0 and 2 have different start and end dates:
import pandas as pd
df = pd.DataFrame(
{
"trade_id": [100000191, 100000192, 100000193, 100000194, 100000231],
"start_date": [2017, 2017, 2018, 2022, 2018],
"end_date": [None, 2017, 2022, 2022, 2018],
"factset_entity_id": [
"0711GY-E",
"0711GY-E",
"0711GY-E",
"0711GY-E",
"05NC3S-E",
],
"product_id": [99996362, 99996413, 99996414, 99996415, 59979529],
}
)
Here is one way to do it using pd.concat:
df = pd.concat([df, df.loc[df["start_date"] != df["end_date"], :]]).sort_index()
Then, you can see that rows 0 and 2 are now duplicated:
print(df)
# Output
trade_id start_date end_date factset_entity_id product_id
0 100000191 2017 NaN 0711GY-E 99996362
0 100000191 2017 NaN 0711GY-E 99996362
1 100000192 2017 2017.0 0711GY-E 99996413
2 100000193 2018 2022.0 0711GY-E 99996414
2 100000193 2018 2022.0 0711GY-E 99996414
3 100000194 2022 2022.0 0711GY-E 99996415
4 100000231 2018 2018.0 05NC3S-E 59979529
Upvotes: 1