Dijkie85
Dijkie85

Reputation: 1106

Replace multiple strings with multiple values in Julia

In Python pandas you can pass a dictionary to df.replace in order to replace every matching key with its corresponding value. I use this feature a lot to replace word abbreviations in Spanish that mess up sentence tokenizers.

Is there something similar in Julia? Or even better, so that I (and future users) may learn from the experience, any ideas in how to implement such a function in Julia's beautiful and performant syntax?

Thank you!

Edit: Adding an example as requested

Input:

julia> DataFrames.DataFrame(Dict("A" => ["This is an ex.", "This is a samp.", "This is a samp. of an ex."]))
3×1 DataFrame
 Row │ A                  
     │ String             
─────┼────────────────────
   1 │ This is an ex.
   2 │ This is a samp.
   3 │ This is a samp. of an ex.

Desired output:

3×1 DataFrame
 Row │ A                  
     │ String             
─────┼────────────────────
   1 │ This is an example
   2 │ This is a sample
   3 │ This is a sample of an example

Upvotes: 5

Views: 846

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69949

In Julia the function for this is also replace. It takes a collection and replaces elements in it. The simplest form is:

julia> x = ["a", "ab", "ac", "b", "bc", "bd"]
6-element Vector{String}:
 "a"
 "ab"
 "ac"
 "b"
 "bc"
 "bd"

julia> replace(x, "a" => "aa", "b" => "bb")
6-element Vector{String}:
 "aa"
 "ab"
 "ac"
 "bb"
 "bc"
 "bd"

If you have more complex replace pattern you can pass a function that does the replacement:

julia> replace(x) do s
           length(s) == 1 ? s^2 : s
       end
6-element Vector{String}:
 "aa"
 "ab"
 "ac"
 "bb"
 "bc"
 "bd"

There is also replace! that does the same in-place.

Is this what you wanted?

EDIT

Replacement of substrings in a vector of strings:

julia> df = DataFrame("A" => ["This is an ex.", "This is a samp.", "This is a samp. of an ex."])
3×1 DataFrame
 Row │ A
     │ String
─────┼───────────────────────────
   1 │ This is an ex.
   2 │ This is a samp.
   3 │ This is a samp. of an ex.

julia> df.A .= replace.(df.A, "ex." => "example", "samp." => "sample")
3-element Vector{String}:
 "This is an example"
 "This is a sample"
 "This is a sample of an example"

Note two things:

  1. you do not need to pass Dict to DataFrame constructor. It is enough to just pass pairs.
  2. In assignment I used .= not =, which perfoms an in-place replacement of updated values in the already existing vector (I show it for a comparison to what @Sundar R proposed in a comment which is an alternative that allocates a new vector; the difference probably does not matter much in your case but I just wanted to show you both syntaxes).

Upvotes: 4

Related Questions