Reputation: 3148
Trying to remove all characters except digits in dataframe column, that has object type and mixed value for age, for example: '44', '60', 'July 89', 'August 42'
Here is the approach which I'm using with extract()
:
data['age'] = data.age.str.extract('(\d+)')
For some reason the output of the column is NaN values. I tried different approaches (for example, with replace()
function) but still is the same issue. Can you give me some hint for fixing that? Thanks!
Upvotes: 1
Views: 421
Reputation: 626728
Regular expression methods only work on strings. It appears your data is of mixed type, including numbers and strings.
It makes sense to first cast all data to string type, then you can proceed to extract the numbers:
data['age'] = data['age'].astype(str).str.extract(r'(\d+)')
Upvotes: 3