SteveS
SteveS

Reputation: 4040

How to extract numbers from mixed dataframe column and replace with numbers only (inplace)?

Given the following toy dataframe:

import pandas as pd
import numpy as np
df = pd.DataFrame({'A':['1a',np.nan,'10a','100b','0b'],
                   })
df

    A
0   1a
1   NaN
2   10a
3   100b
4   0b

I want to remove all the characters/strings and extract the numbers in A column. There is an inplace=True method, but how can extract the numbers and replace them inplace?

I want to get:

    A
0   1
1   NaN
2   10
3   100
4   0

Here is how I am doing it now:

df.A = df.A.str.extract('(\d+)')

Upvotes: 0

Views: 325

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150745

str.extract as the name suggested, doesn't replace, only extracts. Try:

df['A'].replace('(\D.*)','',inplace=True, regex=True)

Output:

     A
0    1
1  NaN
2   10
3  100
4    0

More info on the regex pattern here. Basically:

  1. \D matches any non-digit character
  2. .* matches all the characters that following \D.

So the pattern replaces everything from the first non-digit character with the empty string ''.

Upvotes: 3

RavinderSingh13
RavinderSingh13

Reputation: 133518

With your shown samples, please try following. Simple explanation would be: using replace function of pandas, where I am making regex true, then in regex place its mentioned that to replace anything apart from digits with NULL.

df['A'].replace('([^0-9]*)','', regex=True)

Upvotes: 3

Related Questions