Chick Chirik
Chick Chirik

Reputation: 115

How do you remove certain symbols from a string?

I have some problems with regular expression. I have a dataset with money amount and in some rows there is an odd separator. And i need a regular expression to remove only the odd separator.

For example, this is a data i have:

user_id sum
1       10.10
2       154.24
3       19.565.02
4       2.142.00

And the expected result is:

user_id sum
1       10.10
2       154.24
3       19565.02
4       2142.00
5       1.99

I use python and pandas lib for data analysis.

Help please with regex. Thank you!

Upvotes: 1

Views: 78

Answers (2)

Erfan
Erfan

Reputation: 42916

slighty different way with conditional column creation using np.where from the numpy module:

df['sum'] = np.where(df.sum_col.str.count('\.') >= 2, df.sum_col.str.replace('.', '', 1), df.sum_col )

or for any amount of .:

df['sum'] = pd.to_numeric([i.replace('.','',x) for i,x in 
                           zip(df['sum'],df['sum'].str.count('\.')-1)])

Returns:

    sum_col     sum
0   10.10       10.10
1   154.24      154.24
2   19.565.02   19565.02
3   2.142.00    2142.00

The sum column is the cleaned up column

Upvotes: 1

Julian
Julian

Reputation: 1143

Well, if your data is formed with 2 decimal places on the end, you can skip the regex and just use python.

For example, let's say you get all your data into a list (negate the header row) you can do the following to fix the dataset:

dirty = ['10.10', '154.24', '19.565.02', '2.142.00', '1.99']
# this is a list comprehension that replaces the any '.' with '' in all
# but the last three characters of your strings
clean = [item[:-3].replace('.', '') + item[-3:] for item in dirty]

>>> clean
['10.10', '154.24', '19565.02', '2142.00', '1.99']

Answer updated thanks to @match.

Upvotes: 3

Related Questions