okkotsu
okkotsu

Reputation: 127

finding closest dates in DataFrame

I want to find which column's date is closest to the date in cv_date_x column and return True there. I tried the below code but it returned False to all columns. Is there any solutions?? Thank you so much.

pre_df = pre_df.apply(lambda x: pd.to_datetime(x, errors='coerce'))
# set datetime to all values 
delete_col = ['cv_date_x','cv_date_y','userid']  
pre_cols = pre_df.columns.drop(delete_col)# remove unnecessary cols
    
pre_df1 = pre_df.apply(lambda x: x == np.min(abs(x[pre_cols] - x['cv_date_x'])), axis=1)
print(pre_df1)


The output of "print(pre_df1)"

cv_date_x,col1,col2,col3
False,False,False,False

pre_df

cv_date_x,col1,col2,col3
2021-07-01,2021-05-01,2021-07-02,2021-07-30
⇩
ideal output
cv_date_x,col1,col2,col3
False,False,True,False

because 2021-07-02 is closest to 2021-07-01

Upvotes: 1

Views: 74

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Setup

print(df)

   cv_date_x       col1       col2       col3
0 2021-07-01 2021-05-01 2021-07-02 2021-07-30
1 2021-07-28 2021-05-01 2021-07-02 2021-07-30

Solution with Numpy

a = df.to_numpy()
i = np.abs(a[:, 1:] - a[:, 0, None]).argmin(1) + 1

df_out = pd.DataFrame(i[:, None] == range(df.shape[1]), columns=df.columns)

Solution with pandas

df_out = (
    df.drop('cv_date_x', axis=1)
      .sub(df['cv_date_x'], axis=0)
      .abs().idxmin(1).str.get_dummies()
      .reindex(df.columns, axis=1, fill_value=0)
      .astype('bool')
)

Result

print(df_out)

   cv_date_x   col1   col2   col3
0      False  False   True  False
1      False  False  False   True

Upvotes: 2

Related Questions