Reputation: 250
I need two col
1st one shows the closest date and 2nd one shows the name of the col
d = {'col1': ["id1","id2"] 'Stage 1': [26-01-2021, 04-01-2021],'Stage 2': [27-01-2021, 02-10-2025]}
df = pd.DataFrame(data=d)
df
image reference
Actual
Requirement
I tried
date=datetime.date.today()
idx = plc.index[df[['Stage 1','Stage 2']].index.get_loc(date,
method='nearest')]
Upvotes: 0
Views: 680
Reputation: 9619
After converting the dates to datetime you can pass it to a function that populates both new columns in one go:
import pandas
import datetime
d = {'col1': ["id1","id2"], 'Stage 1': ['26-01-2021', '04-01-2021'],'Stage 2': ['27-01-2021', '02-10-2025']}
df = pd.DataFrame(data=d)
df['Stage 1'] = pd.to_datetime(df['Stage 1'], format='%d-%m-%Y')
df['Stage 2'] = pd.to_datetime(df['Stage 2'], format='%d-%m-%Y')
date=pd.to_datetime(datetime.date.today())
def get_date(row):
date_range = row[['Stage 1', 'Stage 2']]
closest_date_key = abs(date - date_range).argmin()
closest_date = date_range[closest_date_key]
column_name = date_range.keys()[closest_date_key]
return pd.Series((closest_date, column_name))
df[['Requirement 1', 'Requirement 2']] = df.apply(lambda row:get_date(row), axis=1)
Output:
| | col1 | Stage 1 | Stage 2 | Requirement 1 | Requirement 2 |
|---:|:-------|:--------------------|:--------------------|:--------------------|:----------------|
| 0 | id1 | 2021-01-26 00:00:00 | 2021-01-27 00:00:00 | 2021-01-27 00:00:00 | Stage 2 |
| 1 | id2 | 2021-01-04 00:00:00 | 2025-10-02 00:00:00 | 2021-01-04 00:00:00 | Stage 1 |
Upvotes: 1
Reputation: 545
The following would get you there:
First convert your dates to datetimes
so you can use them in comparison operations:
df['Stage 1'] = pd.to_datetime(df['Stage 1'])
df['Stage 2'] = pd.to_datetime(df['Stage 2'])
Then find the position of the closest date for example using:
closest = np.argmin([abs(df['Stage 1'].dt.date - date),
abs(df['Stage 2'].dt.date - date)], axis=1)
And use these positions to get the names of your columns. You can assign that value in your first new column Requirement 1
df['Requirement 1'] = df.columns[-2:][closest]
You can then use the column name in Requirement 1
to get the original date that was closest:
df['Requirement 2'] = df.apply(lambda x: x[x['Requirement 1']].date(), axis=1)
My output looks then like:
col1 Stage 1 Stage 2 Requirement 1 Requirement 2
0 id1 2021-01-26 2025-01-27 Stage 1 2021-01-26
1 id2 2025-04-01 2019-02-10 Stage 2 2019-02-10
Upvotes: 1