caddie
caddie

Reputation: 169

Create new date column in python pandas

I'm trying to create a new date column based on an existing date column in my dataframe. I want to take all the dates in the first column and make them the first of the month in the second column so:

03/15/2019 = 03/01/2019

I know I can do this using:

df['newcolumn'] = pd.to_datetime(df['oldcolumn'], format='%Y-%m-%d').apply(lambda dt: dt.replace(day=1)).dt.date

My issues is some of the data in the old column is not valid dates. There is some text data in some of the rows. So, I'm trying to figure out how to either clean up the data before I do this like:

if oldcolumn isn't a date then make it 01/01/1990 else oldcolumn

Or, is there a way to do this with try/except?

Any assistance would be appreciated.

Upvotes: 0

Views: 36945

Answers (2)

JoergVanAken
JoergVanAken

Reputation: 1286

At first we generate some sample data:

df = pd.DataFrame([['2019-01-03'], ['asdf'], ['2019-11-10']], columns=['Date'])

This can be safely converted to datetime

df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
mask = df['Date'].isnull()
df.loc[mask, 'Date'] = dt.datetime(1990, 1, 1)

Now you don't need the slow apply

df['New'] = df['Date'] + pd.offsets.MonthBegin(-1)

Upvotes: 2

Erfan
Erfan

Reputation: 42886

Try with the argument errors=coerce. This will return NaT for the text values.

df['newcolumn'] = pd.to_datetime(df['oldcolumn'], 
                                 format='%Y-%m-%d', 
                                 errors='coerce').apply(lambda dt: dt.replace(day=1)).dt.date

For example

# We have this dataframe
    ID        Date
0  111  03/15/2019
1  133  01/01/2019
2  948       Empty
3  452  02/10/2019

# We convert Date column to datetime
df['Date'] = pd.to_datetime(df.Date, format='%m/%d/%Y', errors='coerce')

Output

    ID       Date
0  111 2019-03-15
1  133 2019-01-01
2  948        NaT
3  452 2019-02-10

Upvotes: 1

Related Questions