user3047030
user3047030

Reputation: 69

How to create a pandas DataFrame column based on two other columns that holds dates?

I have a pandas Dataframe with two date columns (A and B) and I would like to create a 3rd column (C) that holds dates created using month and year from column A and the day of column B. Obviously I would need to change the day for the months that day doesn't exist like we try to create 31st Feb 2020, it would need to change it to 29th Feb 2020.

For example

import pandas as pd
df = pd.DataFrame({'A': ['2020-02-21', '2020-03-21', '2020-03-21'], 
                   'B': ['2020-01-31', '2020-02-11', '2020-02-01']})
for c in df.columns:
    dfx[c] = pd.to_datetime(dfx[c])

Then I want to create a new column C that is a new datetime that is:

year = df.A.dt.year

month = df.A.dt.month

day = df.B.dt.day

I don't know how to create this column. Can you please help?

Upvotes: 0

Views: 65

Answers (1)

Arne
Arne

Reputation: 10545

Here is one way to do it, using pandas' time series functionality:

import pandas as pd

# your example data
df = pd.DataFrame({'A': ['2020-02-21', '2020-03-21', '2020-03-21'], 
                   'B': ['2020-01-31', '2020-02-11', '2020-02-01']})
for c in df.columns:
    # keep using the same dataframe here
    df[c] = pd.to_datetime(df[c])

# set back every date from A to the end of the previous month,
# then add the number of days from the date in B
df['C'] = df.A - pd.offsets.MonthEnd() + pd.TimedeltaIndex(df.B.dt.day, unit='D')

display(df)

Result:

             A           B           C
0   2020-02-21  2020-01-31  2020-03-02
1   2020-03-21  2020-02-11  2020-03-11
2   2020-03-21  2020-02-01  2020-03-01

As you can see in row 0, this handles the case of "February 31st" not quite as you suggested, but still in a logical way.

Upvotes: 1

Related Questions