Reputation: 169
I'm trying to create a new date column based on an existing date column in my dataframe. I want to take all the dates in the first column and make them the first of the month in the second column so:
03/15/2019 = 03/01/2019
I know I can do this using:
df['newcolumn'] = pd.to_datetime(df['oldcolumn'], format='%Y-%m-%d').apply(lambda dt: dt.replace(day=1)).dt.date
My issues is some of the data in the old column is not valid dates. There is some text data in some of the rows. So, I'm trying to figure out how to either clean up the data before I do this like:
if oldcolumn isn't a date then make it 01/01/1990 else oldcolumn
Or, is there a way to do this with try/except?
Any assistance would be appreciated.
Upvotes: 0
Views: 36945
Reputation: 1286
At first we generate some sample data:
df = pd.DataFrame([['2019-01-03'], ['asdf'], ['2019-11-10']], columns=['Date'])
This can be safely converted to datetime
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
mask = df['Date'].isnull()
df.loc[mask, 'Date'] = dt.datetime(1990, 1, 1)
Now you don't need the slow apply
df['New'] = df['Date'] + pd.offsets.MonthBegin(-1)
Upvotes: 2
Reputation: 42886
Try with the argument errors=coerce
.
This will return NaT
for the text values.
df['newcolumn'] = pd.to_datetime(df['oldcolumn'],
format='%Y-%m-%d',
errors='coerce').apply(lambda dt: dt.replace(day=1)).dt.date
For example
# We have this dataframe
ID Date
0 111 03/15/2019
1 133 01/01/2019
2 948 Empty
3 452 02/10/2019
# We convert Date column to datetime
df['Date'] = pd.to_datetime(df.Date, format='%m/%d/%Y', errors='coerce')
Output
ID Date
0 111 2019-03-15
1 133 2019-01-01
2 948 NaT
3 452 2019-02-10
Upvotes: 1