Reputation: 57
Suppose I have a DataFrame with two columns - gibberish
and letter
.
I want to replace substrings in gibberish
so that only the ones matching letter
remain, e.g. If gibberish
is "kjkkj"
and the letter
is "j"
I want gibberish
to equal "jj"
.
The DataFrame is defined as:
df = DataFrame(gibberish = ["dqzzzjbzz", "jjjvjmjjkjjjjjjj", "mmbmmlvmbmmgmmf"], letter = ["z", "j", "m"])
If I had no letter
variable and wanted only, let's say "x" to remain I would do:
df.gibberish.= replace.(gibberish, r"[^x;]" => "")
and that works fine, but when I try doing the same, but putting in letter
column as a variable in the regex expression, it just breaks.
I tried doing that the "normal" DataFrames.jl way and with DataFramesMeta.jl shortcut @transform
:
df.gibberish.= replace.(gibberish, Regex(join(["[^", letter, ";]"])) => "")
which results in an error of
ERROR: UndefVarError: letter not defined
while the @transform
way just doesn't do anything:
julia> @transform(df, filtered = replace(:gibberish, Regex.(join(["[^", :letter, ";]"])) => ""))
3×3 DataFrame
│ Row │ letter │ gibberish │ filtered │
│ │ String │ String │ String │
├──────┼────────┼───────────────────┼───────────────────┤
│ 1 │ z │ dqzzzjbzz │ dqzzzjbzz │
│ 2 │ j │ jjjvjmjjkjjjjjjj │ jjjvjmjjkjjjjjjj │
│ 3 │ m │ mmbmmlvmbmmgmmf │ mmbmmlvmbmmgmmf │
I'm a very fresh beginner in Julia and I'm probably missing something very basic, but the proper solution just escapes me. How do I solve this problem, other than writing a rowwise loop which would be horribly inefficient?
Upvotes: 1
Views: 387
Reputation: 1158
replace.(gibberish, Regex(join(["[^", letter, ";]"]))
letter refers here to a Julia variable (which is not defined), not to a column of the DataFrame.
You could try something like
Regex.(string.("[^" .* df.letter .* ";]"))
to construct an array of Regexes using a DataFrame row as input.
Upvotes: 1