Reputation: 2296
A column in my pandas data frame consists of both numbers and string values. I tried a couple of answers, yet I get NaN values for the cells that already only contain numerical value.
Example
Column1
Oh great, 10,7
Awesome, 20.3
1.5
Vaaw, 9/10
100
Code:
df['Column1'] = df['Column1'].str.extract('([0-9][,.]*[0-9]*)')
for the third and fourth rows for instance I get Nan. I am fine with keeping 9/10 for the fourth row
Upvotes: 0
Views: 1596
Reputation: 2776
You were close.
df['Column1'] = df['Column1'].str.extract('([0-9]+[,./]*[0-9]*)')
#df
Column1
0 10,7
1 20.3
2 1.5
3 9/10
Another possible regex would be the following. It will yield identical results in this example but is a bit more "general". It says find digits, followed (possibly) by anything other than whitespace, followed by (possibly) more digits. It would fail for a string like "hello 10f7"
by returning 10f7
. So it depends on what you know about the quality of your data.
df['Column1'] = df['Column1'].str.extract('(\d+\W*\d*)')
Upvotes: 3
Reputation: 16876
If your Column1
is of type object (ie. it contains both string and numbers) then you will have to first force it to string type and use extract
df = pd.DataFrame({'Column1': [
'Oh great, 10,7',
'Awesome, 20.3',
'1.5',
1.5,
'Vaaw, 9/10']})
print (df['Column1'].astype(str).str.extract('([0-9]+[,./]*[0-9]*)'))
Output:
0
0 10,7
1 20.3
2 1.5
3 1.5
4 9/10
However, if your Column1
contain only strings then you just have to correct your regex:
df = pd.DataFrame({'Column1': [
'Oh great, 10,7',
'Awesome, 20.3',
'1.5',
'Vaaw, 9/10']})
print (df['Column1'].str.extract('([0-9]+[,./]*[0-9]*)'))
Output:
0
0 10,7
1 20.3
2 1.5
3 9/10
Upvotes: 4
Reputation: 150745
Let's try:
df['Value'] = df['Column1'].str.extract('([\d/,\.]+)$')[0]
Or even simpler:
df['Value'] = df['Column1'].str.extract('(\S+)$')[0]
Output:
Column1 Value
0 Oh great, 10,7 10,7
1 Awesome, 20.3 20.3
2 1.5 1.5
3 Vaaw, 9/10 9/10
Upvotes: 4
Reputation: 26676
User regex digit\special charcter(.)\digit
df['num']=df.Column1.str.extract('(\d+\.\d+)')
Column1 num
0 Oh great, 10,7 NaN
1 Awesome, 20.3 20.3
2 1.5 1.5
3 Vaaw, 9/10 NaN
Upvotes: 3