Reputation: 4408
I have a dataframe, df, where I have two columns that contain certain values. If the values match a specific string, I would like to then place a certain value in a new column for multiple columns.
Data
ID Quarter Delivery
A Q1 2022 Q3 2022
A Q1 2022 Q3 2022
B Q1 2022 Q3 2022
B Q1 2022 Q3 2022
Desired
ID Quarter Delivery QuarterFull DeliveryFull
A Q1 2022 Q3 2022 1/1/2022 07/1/2022
A Q1 2022 Q3 2022 1/1/2022 07/1/2022
B Q4 2022 Q2 2023 10/1/2022 04/1/2023
B Q4 2022 Q2 2023 10/1/2022 04/1/2023
Q1 = 01
Q2 = 04
Q3 = 07
Q4 = 10
Doing
for Quarter, item in df.iteritems():
if df['Quarter'] == df['Q1 2022']:
return df['1/1/2022']
elif df['Quarter'] == df['Q4 2022']:
return df['10/1/2022']
elif df['Delivery'] == df['Q3 2022']:
return df['07/01/2022']
elif df['Delivery'] == df['Q2 2023']:
return df['04/01/2023]
else:
return df['None']
I believe running a loop may be the best way to approach since I have a decent sized dataset of 100,000 rows. I am not sure how to add these two new columns to the dataframe while using the loop. Any suggestion is appreciated
Upvotes: 0
Views: 75
Reputation: 14949
You can just manipulate Quarter/Delivery column to get the required result:
df['Quarter'] = df['Quarter'].str.replace(r'(Q\d) (\d+)', r'\2-\1', regex =True)
df['Delivery'] = df['Delivery'].str.replace(r'(Q\d) (\d+)', r'\2-\1', regex =True)
df['QuarterFull'] = pd.PeriodIndex(df['Quarter'], freq='Q').to_timestamp().strftime('%m-%d-%Y')
df['DeliveryFull'] = pd.PeriodIndex(df['Delivery'], freq='Q').to_timestamp().strftime('%m-%d-%Y')
ID Quarter Delivery QuarterFull DeliveryFull
0 A 2022-Q1 2022-Q3 01-01-2022 07-01-2022
1 A 2022-Q1 2022-Q3 01-01-2022 07-01-2022
2 B 2022-Q4 2022-Q2 10-01-2022 04-01-2022
3 B 2022-Q4 2022-Q2 10-01-2022 04-01-2022
Upvotes: 1