HDunn
HDunn

Reputation: 533

how to replace non-numeric or decimal in string in pandas

I have a column with values in degrees with the degree sign.

42.9377º
42.9368º
42.9359º
42.9259º
42.9341º

The digit 0 should replace the degree symbol

I tried using regex or str.replace but I can't figure out the exact unicode character.

The source xls has it as º

the error shows it as an obelus ÷

printing the dataframe shows it as ?

the exact position of the degree sign may vary, depending on rounding of the decimals, so I can't replace using exact string position.

Upvotes: 1

Views: 1387

Answers (2)

jezrael
jezrael

Reputation: 862471

Use str.replace:

df['a'] = df['a'].str.replace('º', '0')
print (df)
          a
0  42.93770
1  42.93680
2  42.93590
3  42.92590
4  42.93410

#check hex format of char
print ("{:02x}".format(ord('º')))
ba

df['a'] = df['a'].str.replace(u'\xba', '0')
print (df)
          a
0  42.93770
1  42.93680
2  42.93590
3  42.92590
4  42.93410

Solution with extract floats.

df['a'] = df['a'].str.extract('(\d+\.\d+)', expand=False) + '0'
print (df)
          a
0  42.93770
1  42.93680
2  42.93590
3  42.92590
4  42.93410

Or if all last values are º is possible use indexing with str:

df['a'] = df['a'].str[:-1] + '0'
print (df)
          a
0  42.93770
1  42.93680
2  42.93590
3  42.92590
4  42.93410

Upvotes: 2

Mike Scotty
Mike Scotty

Reputation: 10782

If you know that it's always the last character you could remove that character and append a "0".

s = "42.9259º"

s = s[:-1]+"0"

print(s) # 42.92590

Upvotes: 1

Related Questions