Kevin Zakka
Kevin Zakka

Reputation: 465

Dropping column values that don't meet a requirement

I have a pandas data frame with a 'date_of_birth' column. Values take the form 1977-10-24T00:00:00.000Z for example.

I want to grab the year, so I tried the following:

X['date_of_birth'] = X['date_of_birth'].apply(lambda x: int(str(x)[4:]))

This works if I am guaranteed that the first 4 letters are always integers, but it fails on my data set as some dates are messed up or garbage. Is there a way I can adjust my lambda without using regex? If not, how could I write this in regex?

Upvotes: 4

Views: 76

Answers (1)

EdChum
EdChum

Reputation: 394329

I think it would be better to just use to_datetime to convert to datetime dtype, you can drop the invalid rows using dropna and also access just the year attribute using dt.year:

In [58]:
df = pd.DataFrame({'date':['1977-10-24T00:00:00.000Z', 'duff', '200', '2016-01-01']})
df['mod_dates'] = pd.to_datetime(df['date'], errors='coerce')
df

Out[58]:
                       date  mod_dates
0  1977-10-24T00:00:00.000Z 1977-10-24
1                      duff        NaT
2                       200        NaT
3                2016-01-01 2016-01-01

In [59]:    
df.dropna()

Out[59]:
                       date  mod_dates
0  1977-10-24T00:00:00.000Z 1977-10-24
3                2016-01-01 2016-01-01

In [60]:
df['mod_dates'].dt.year

Out[60]:
0    1977.0
1       NaN
2       NaN
3    2016.0
Name: mod_dates, dtype: float64

Upvotes: 1

Related Questions