DidSquids
DidSquids

Reputation: 199

Conditional replace comma or spaces in number string in Pandas DataFrame column without a loop

Sometimes the string numbers in my DataFrames have commas in them representing either decimal or marking the thousand, some do not. The dataframe is an example of the range of price formats I receive via an API and vary depend on the currency. These are prices and the decimals will always be 2. So I need to output the string prices into float so I can sum them or separate them into other dataframes or use them for plotting graphs. I have created a loop to replace them, but is there a quicker way to do this without the loop?

My DataFrame and working loop is as follows:

data = {'amount': ['7,99', '6,99', '9.99', '-6,99', '1,000.00']}
df = pd.DataFrame(data)

fees = []
sales = []

for items in df['amount']:
    if items[-7:-6] == ',':
        items = float(items.replace(',', '').replace(' ',''))
    if items[-3:-2] == ',':
        items = float(items.replace(',', '.').replace(' ',''))
    items = float(items)
    if items <= 0:
        fees.append(items)
    else:
        sales.append(items)

I have attempted to do this without the loop but can't seem to work out where I have gone wrong.

df["amount"] = np.where((df['amount'][-7:-6] == ','),
                       df["amount"][-7:-6].str.replace(',', '').replace(' ',''),
                       df["amount"])

df["amount"] = np.where((df['amount'][-3:-2] == ','),
                       df["amount"][-3:-2].str.replace(',', '').replace(' ',''),
                       df["amount"])

Any help would be much appreciated. Thank you in advance

Upvotes: 2

Views: 4306

Answers (3)

ManojK
ManojK

Reputation: 1640

Since you mention the last two digits are decimal points, so the ',' needs to be replaced with '.' to make it float, but you also have some values like 1,000.00 that will become irrelevant if the ',' is replaced with '.', hence you can use a regex to identify what values to be replaced:

data = {'amount': ['7,99', '6,99', '9.99', '-6,99', '1,000.00']}
df = pd.DataFrame(data)
df

First the regex will match all string with ',' and two decimal points, then the replace function will replace the match with a '.' & the captured values (99 from ,99)

df['amount'] = df['amount'].str.replace(r'(,)(\d{2}$)',r'.\2') 
# here `r'.\2'`is second `captured group` in `regex`

Then to convert 1,000.00 to float we will replace the ',' with blank

df['amount'] = df['amount'].str.replace(',','')

And then convert the data type to float

df['amount'] = df['amount'].astype(float)

print(df)
    amount
0   799.00
1   699.00
2     9.99
3  -699.00
4  1000.00

Upvotes: 2

Vishnudev Krishnadas
Vishnudev Krishnadas

Reputation: 10960

Try using split and join,

df.amount.str.split(',').str.join('').astype(float)

Output

0     799.00
1     699.00
2       9.99
3    -699.00
4    1000.00
Name: amount, dtype: float64

Upvotes: 0

sigma1510
sigma1510

Reputation: 1281

You can use lambdas instead of numpy:

lambda1 = lambda items: float(str(items).replace(',', '').replace(' ','')) if str(items)[-7:-6] == ',' else items
lambda2 = lambda items: float(str(items).replace(',', '.').replace(' ','')) if str(items)[-3:-2] == ',' else items
to_float = lambda items: float(items)

df['amount_clean'] = df["amount"].map(lambda1).map(lambda2).map(to_float) 

=========================================================================

Edit: what are lambdas

In python, lambda functions are small anonymous functions with a single expression (see https://www.w3schools.com/python/python_lambda.asp)

Example with condition:

lambda x: x + 1 if x < 0 else x

This is equivalent to:

def my_lambda_function(x):
    if x < 0:
        return x + 1
    else:
        return x

When passed to the column of a pandas dataframe via the map function, the lambda expression will be applied to the value in each row of the column.

Hope this helps!

Upvotes: 1

Related Questions