JA-pythonista
JA-pythonista

Reputation: 1323

More efficient way to match strings in a Pandas DataFrame

I have a column of a Pandas DataFrame called "Steuersatz".

This column is made of the following unique strings:

array(['19,00%', '0,00%', '5,00%', '4,64%', '4,04%', '4,10%', '1,63%', '3,55%',
       '1,14%', '0,96%', '11,31%', '12,35%', '10,45%', '11,00%', '12,99%',
       '10,83%', '6,82%', '11,50%', '16,00%', '3,30%', '4,00%', '4,16%',
       '4,15%', '10,38%', '11,43%', '11,58%'], dtype=object)

I am trying to match patterns such that if the number is 19,00 or anything with 00 at the end, it should instead display 19% or just that digit and %

Here is what I am doing to solve this problem:

df["Steuersatz"] = df["Steuersatz"].map("{:,.2f}%".format)
df["Steuersatz"] = df["Steuersatz"].str.replace(".",",")
df['Steuersatz'] = df['Steuersatz'].str.replace("19,00%","19%")
df['Steuersatz'] = df['Steuersatz'].str.replace("0,00%","0%")
df['Steuersatz'] = df['Steuersatz'].str.replace("11,00%","11%")
df['Steuersatz'] = df['Steuersatz'].str.replace("5,00%","5%")
df['Steuersatz'] = df['Steuersatz'].str.replace("4,00%","4%")
df['Steuersatz'] = df['Steuersatz'].str.replace("16,00%","16%")

To me, this is inefficient, I am looking at doing this automatically rather than checking to manually replace.

Many thanks for your input

Upvotes: 0

Views: 100

Answers (1)

Lukas Thaler
Lukas Thaler

Reputation: 2720

Why not just replace ,00 with an empty string? pd.Series.str.replace is able to handle regex (actually does that by default) and thus can perform partial matching:

df['Steuersatz'] = df['Steuersatz'].str.replace(",00","")

This not only removes several repeated lines from your code but also handles new cases, say 23,00%

Upvotes: 2

Related Questions