Reputation: 1106
In Python pandas you can pass a dictionary to df.replace
in order to replace every matching key with its corresponding value. I use this feature a lot to replace word abbreviations in Spanish that mess up sentence tokenizers.
Is there something similar in Julia? Or even better, so that I (and future users) may learn from the experience, any ideas in how to implement such a function in Julia's beautiful and performant syntax?
Thank you!
Edit: Adding an example as requested
Input:
julia> DataFrames.DataFrame(Dict("A" => ["This is an ex.", "This is a samp.", "This is a samp. of an ex."]))
3×1 DataFrame
Row │ A
│ String
─────┼────────────────────
1 │ This is an ex.
2 │ This is a samp.
3 │ This is a samp. of an ex.
Desired output:
3×1 DataFrame
Row │ A
│ String
─────┼────────────────────
1 │ This is an example
2 │ This is a sample
3 │ This is a sample of an example
Upvotes: 5
Views: 846
Reputation: 69949
In Julia the function for this is also replace
. It takes a collection and replaces elements in it. The simplest form is:
julia> x = ["a", "ab", "ac", "b", "bc", "bd"]
6-element Vector{String}:
"a"
"ab"
"ac"
"b"
"bc"
"bd"
julia> replace(x, "a" => "aa", "b" => "bb")
6-element Vector{String}:
"aa"
"ab"
"ac"
"bb"
"bc"
"bd"
If you have more complex replace pattern you can pass a function that does the replacement:
julia> replace(x) do s
length(s) == 1 ? s^2 : s
end
6-element Vector{String}:
"aa"
"ab"
"ac"
"bb"
"bc"
"bd"
There is also replace!
that does the same in-place.
Is this what you wanted?
Replacement of substrings in a vector of strings:
julia> df = DataFrame("A" => ["This is an ex.", "This is a samp.", "This is a samp. of an ex."])
3×1 DataFrame
Row │ A
│ String
─────┼───────────────────────────
1 │ This is an ex.
2 │ This is a samp.
3 │ This is a samp. of an ex.
julia> df.A .= replace.(df.A, "ex." => "example", "samp." => "sample")
3-element Vector{String}:
"This is an example"
"This is a sample"
"This is a sample of an example"
Note two things:
Dict
to DataFrame
constructor. It is enough to just pass pairs..=
not =
, which perfoms an in-place replacement of updated values in the already existing vector (I show it for a comparison to what @Sundar R proposed in a comment which is an alternative that allocates a new vector; the difference probably does not matter much in your case but I just wanted to show you both syntaxes).Upvotes: 4