Reputation: 37
I have a df
that looks something like this:
col1 | col2 | col3 |
---|---|---|
80% | 10% | SP |
90% | 0% | SP |
90% | 10% | SP |
70% | SP | 20% |
90% | SP | 0% |
As you can see, the values have a %
sign appended onto them, I could usually remove this by using a pd.to_numeric()
function and using df[col2].str.rstrip('%').astype('float') / 100)
however, I cannot do this because the columns currently contain strings such as SP
which throws an error when doing this
Any ideas as to how to do this?
Upvotes: 1
Views: 269
Reputation: 31146
A regular expression replace()
works. Also did astype()
for columns that can become fully typed.
df = pd.read_csv(io.StringIO("""col1 col2 col3
80% 10% SP
90% 0% SP
90% 10% SP
70% SP 20%
90% SP 0%"""), sep="\t")
df.replace("^([0-9]+)%$", r".\1", regex=True).astype("float64", errors="ignore")
col1 col2 col3
0.8 .10 SP
0.9 .0 SP
0.9 .10 SP
0.7 SP .20
0.9 SP .0
Upvotes: 0