DBOak
DBOak

Reputation: 145

replace part of string with values from dictionary?

Loving the Polars library for its fantastic speed and easy syntax!

Struggling with this question - is there an analogue in Polars for the Pandas code below? Would like to replace strings using a dictionary.

Tried using this expression, but it returns 'TypeError: 'dict' object is not callable'

pl.col("List").str.replace_all(lambda key: key,dict())

Trying to replace the Working Pandas code below with a Polars expression

df = pd.DataFrame({'List':[
    'Systems',
    'Software',
    'Cleared'
    ]})

dic = {
    'Systems':'Sys'
    ,'Software':'Soft'
    ,'Cleared':'Clr'
    }

df["List"] = df["List"].replace(dic, regex=True)

Output:

 List
 0   Sys
 1  Soft
 2   Clr

Upvotes: 4

Views: 1396

Answers (2)

jqurious
jqurious

Reputation: 21544

There is a "stale" feature request for accepting a dictionary:

One possible workaround is to stack multiple expressions in a loop:

expr = pl.col("List")

for old, new in dic.items():
    expr = expr.str.replace_all(old, new)
    
df.with_columns(result = expr)
shape: (3, 2)
┌──────────┬────────┐
│ List     ┆ result │
│ ---      ┆ ---    │
│ str      ┆ str    │
╞══════════╪════════╡
│ Systems  ┆ Sys    │
│ Software ┆ Soft   │
│ Cleared  ┆ Clr    │
└──────────┴────────┘

For non-regex cases, there is also .str.replace_many():

df.with_columns(
   pl.col("List").str.replace_many(
       ["Systems", "Software", "Cleared"],
       ["Sys", "Soft", "Clr"]
   )
   .alias("result")
)

Upvotes: 4

Dean MacGregor
Dean MacGregor

Reputation: 18671

I think your best bet would be to turn your dic into a dataframe and join the two.

You need to convert your dic to the format which will make a nice DataFrame. You can do that as a list of dicts so that you have

dicdf=pl.DataFrame([{'List':x, 'newList':y} for x,y in dic.items()])

where List is what your column name is and we're arbitrary making newList our new column name that we'll get rid of later

You'll want to join that with your original df and then select all columns except the old List plus newList but renamed to List

df=df.join(
    dicdf, 
    on='List') \
.select([
    pl.exclude(['List','newList']), 
    pl.col('newList').alias('List')
 ])

Upvotes: 1

Related Questions