pipo_exquis
pipo_exquis

Reputation: 3

Can I use a dictionary in Python to replace multiple characters?

I am looking for a way to write this code consisely. It's for replacing certain characters in a Pandas DataFrame column.

df['age'] = ['[70-80)' '[50-60)' '[60-70)' '[40-50)' '[80-90)' '[90-100)']

df['age'] = df['age'].str.replace('[', '')
df['age'] = df['age'].str.replace(')', '')
df['age'] = df['age'].str.replace('50-60', '50-59')
df['age'] = df['age'].str.replace('60-70', '60-69')
df['age'] = df['age'].str.replace('70-80', '70-79')
df['age'] = df['age'].str.replace('80-90', '80-89')
df['age'] = df['age'].str.replace('90-100', '90-99')

I tried this, but it didn't work, strings in df['age'] were not replaced:

chars_to_replace = {
    '[' : '',
    ')' : '',
    '50-60' : '50-59',
    '60-70' : '60-69',
    '70-80' : '70-79',
    '80-90' : '80-89',
    '90-100': '90-99'
                  }

for key in chars_to_replace.keys():
    df['age'] = df['age'].replace(key, chars_to_replace[key])

UPDATE

As pointed out in the comments, I did forget str before replace. Adding it solved my problem, thank you!

Also, thank you tdelaney for that answer, I gave it a try and it works just as well. I am not familiar with regex substitions yet, I wasn't comfortable with the other options.

Upvotes: 0

Views: 151

Answers (4)

geekay
geekay

Reputation: 450

change the last part to this

for i in range(len(df['age'])):
    for x in chars_to_replace:
        df['age'].iloc[i]=df['age'].iloc[i].replace(x,chars_to_replace[x])

Upvotes: 0

tdelaney
tdelaney

Reputation: 77347

Assuming these brackets are on all of the entries, you can slice them off and then replace the range strings. From the docs, pandas.Series.replace, pandas will replace the values from the dict without the need for you to loop.

import pandas as pd

df = pd.DataFrame({
    "age":['[70-80)', '[50-60)', '[60-70)', '[40-50)', '[80-90)', '[90-100)']})

ranges_to_replace = {
    '50-60' : '50-59',
    '60-70' : '60-69',
    '70-80' : '70-79',
    '80-90' : '80-89',
    '90-100': '90-99'}

df['age'] = df['age'].str.slice(1,-1).replace(ranges_to_replace)
print(df)

Output

     age
0  70-79
1  50-59
2  60-69
3  40-50
4  80-89
5  90-99

Upvotes: 1

Barbara Gendron
Barbara Gendron

Reputation: 445

In addition to previous response, if you want to apply the regex substitution to your dataframe, you can use the apply method from pandas. To do so, you need to put the regex substitution into a function, then use the apply method:

def replace_chars(chars):
    string = re.sub(r'(\d+)-(\d+)', repl, chars)
    string = re.sub(r'\[|\)', ' ', string)
    return string
    
df['age'] = df['age'].apply(replace_chars)

print(df)

which gives the following output:

                                          age
0   70-79  50-59  60-69  40-49  80-89  90-99 

By the way, here I put spaces between the ages intervals. Hope this helps.

Upvotes: 0

Fractalism
Fractalism

Reputation: 1215

Use two passes of regex substitution.

In the first pass, match each pair of numbers separated by -, and decrement the second number.

In the second pass, remove any occurrences of [ and ).

By the way, did you mean to have spaces between each pair of numbers? Because as it is now, implicit string concatenation puts them together without spaces.

import re

string = '[70-80)' '[50-60)' '[60-70)' '[40-50)' '[80-90)' '[90-100)'

def repl(m: re.Match):
    age1 = m.group(1)
    age2 = int(m.group(2)) - 1
    return f"{age1}-{age2}"

string = re.sub(r'(\d+)-(\d+)', repl, string)
string = re.sub(r'\[|\)', '', string)

print(string)  # 70-7950-5960-6940-4980-8990-99

The repl function above can be condensed into a lambda:

repl = lambda m: f"{m.group(1)}-{int(m.group(2))-1}"

Update: Actually, this can be done in one pass.

import re

string = '[70-80)' '[50-60)' '[60-70)' '[40-50)' '[80-90)' '[90-100)'

repl = lambda m: f"{m.group(1)}-{int(m.group(2))-1}"

string = re.sub(r'\[(\d+)-(\d+)\)', repl, string)

print(string)  # 70-7950-5960-6940-4980-8990-99

Upvotes: 2

Related Questions