Jasper
Jasper

Reputation: 2231

Pandas: cant strip '-' from number in CSV

I have the following problem, I have two separate python scripts. The first converts a CSV file to a new format (dropping certain columns and removing dollar signs)

A separate scripts reads all of the converted CSV files and create a single large script from those, grouping the data by certain keys.

All of this works well enough, except for the fact that I keep running into the following problem.

One of the columns in both files Money Out sometimes contains values starting with a '-' (-235.93). I want to remove this symbol from the file.

I've put the following code in both scripts, yet I am unable to remove the symbol. The data is sometimes numeric and sometimes a string. I thought that unifying it to string and using the 'replace' method would solve the problem, yet it didnt. I also used the build in pandas method, yet to no avail.

df['Money Out'] = df['Money Out'].astype(str)
df['Money Out'] = df['Money Out'].replace('-', '')
df['Money Out'].replace('-', '', inplace=True)

Does anyone have an idea of what I might be doing wrong?

Upvotes: 0

Views: 83

Answers (2)

Jon Clements
Jon Clements

Reputation: 142106

Your code:

df['Money Out'].replace('-', '', inplace=True)

is using Series.replace which is a different thing (replaces an exact match of the column or a regex with a value) than the Series.str.replace function (more traditional replace a substring present anywhere with another), you can use:

df['Money Out'] = df['Money Out'].str.replace('-', '')

to remove any hyphens from your text.

Upvotes: 1

Abdelmajid ELHAMDAOUI
Abdelmajid ELHAMDAOUI

Reputation: 325

Try to do this

df['Money Out'] = df.apply(lambda r: ("%s"%(r['Money Out'] or "0.0")).replace("-",""), axis=1)

Upvotes: 1

Related Questions