edyvedy13
edyvedy13

Reputation: 2296

Extracting numbers with decimals from string in pandas

A column in my pandas data frame consists of both numbers and string values. I tried a couple of answers, yet I get NaN values for the cells that already only contain numerical value.

Example

Column1
Oh great, 10,7
Awesome, 20.3
1.5
Vaaw, 9/10
100

Code:

df['Column1'] = df['Column1'].str.extract('([0-9][,.]*[0-9]*)')

for the third and fourth rows for instance I get Nan. I am fine with keeping 9/10 for the fourth row

Upvotes: 0

Views: 1596

Answers (4)

noah
noah

Reputation: 2776

You were close.

df['Column1'] = df['Column1'].str.extract('([0-9]+[,./]*[0-9]*)')
#df
  Column1
0    10,7
1    20.3
2     1.5
3    9/10

Another possible regex would be the following. It will yield identical results in this example but is a bit more "general". It says find digits, followed (possibly) by anything other than whitespace, followed by (possibly) more digits. It would fail for a string like "hello 10f7" by returning 10f7. So it depends on what you know about the quality of your data.

df['Column1'] = df['Column1'].str.extract('(\d+\W*\d*)')

Upvotes: 3

mujjiga
mujjiga

Reputation: 16876

If your Column1 is of type object (ie. it contains both string and numbers) then you will have to first force it to string type and use extract

df = pd.DataFrame({'Column1': [
                               'Oh great, 10,7', 
                               'Awesome, 20.3', 
                               '1.5', 
                               1.5, 
                               'Vaaw, 9/10']})

print (df['Column1'].astype(str).str.extract('([0-9]+[,./]*[0-9]*)'))

Output:

      0
0  10,7
1  20.3
2   1.5
3   1.5
4  9/10

However, if your Column1 contain only strings then you just have to correct your regex:

df = pd.DataFrame({'Column1': [
                               'Oh great, 10,7', 
                               'Awesome, 20.3', 
                               '1.5', 
                               'Vaaw, 9/10']})

print (df['Column1'].str.extract('([0-9]+[,./]*[0-9]*)'))

Output:

      0
0  10,7
1  20.3
2   1.5
3  9/10

Upvotes: 4

Quang Hoang
Quang Hoang

Reputation: 150745

Let's try:

df['Value'] = df['Column1'].str.extract('([\d/,\.]+)$')[0]

Or even simpler:

df['Value'] = df['Column1'].str.extract('(\S+)$')[0]

Output:

          Column1 Value
0  Oh great, 10,7  10,7
1   Awesome, 20.3  20.3
2             1.5   1.5
3      Vaaw, 9/10  9/10

Upvotes: 4

wwnde
wwnde

Reputation: 26676

User regex digit\special charcter(.)\digit

df['num']=df.Column1.str.extract('(\d+\.\d+)')

          Column1   num
0  Oh great, 10,7   NaN
1   Awesome, 20.3  20.3
2             1.5   1.5
3      Vaaw, 9/10   NaN

Upvotes: 3

Related Questions