Reputation: 701
I have a DataFrame with 3 columns: Col1, Col2, Col3. Each column contains an array of strings that has one of these values: "V1", "V2", "V3". For example, an array can look like this: ["V2", "V3"]
I want to create three new columns called: :V1, :V2, :V3 with values "Yes" in :V1 if :V1 appears in ANY of the arrays in columns Col1, Col2, Col3. Same for V2 and V3.
Here is my current code which works but got there by trial and error and wonder if there is a simpler solution (perhaps a one liner):
@chain df begin
transform(Cols(r"^Col") => ByRow((x...) -> any(any(y -> y == "V1", arr) for arr in x) ? "Yes" : "No") => :V1)
transform(Cols(r"^Col") => ByRow((x...) -> any(any(y -> y == "V2", arr) for arr in x) ? "Yes" : "No") => :V2)
transform(Cols(r"^Col") => ByRow((x...) -> any(any(y -> y == "V3", arr) for arr in x) ? "Yes" : "No") => :V3)
end
Is there a simpler approach and a way to make this one line as the logic is repeated.
Can I use in
function to check if a value is contained in the array? How to implement a solution using in
?
Upvotes: 1
Views: 43
Reputation: 69899
Alternative one-liner:
transform(df, [AsTable(Cols(r"^Col")) => ByRow(x -> v in x ? "Yes" : "No") => v for v in string.("V", 1:3)])
Upvotes: 0
Reputation: 19088
A naive approach would be looping through the elements ["V1", "V2", "V3"]
and comparing them with each element per row, then replacing the boolean vectors with the desired strings. Finally adding the vectors to the DataFrame.
using DataFrames
for ele in ["V1", "V2", "V3"]
vec = []
for row in eachrow(df)
push!(vec, any(Vector(row) .== ele))
end
df[:,ele] = [i ? "Yes" : "No" for i in vec]
end
result
df
3×6 DataFrame
Row │ Col1 Col2 Col3 V1 V2 V3
│ String String String String String String
─────┼────────────────────────────────────────────────
1 │ V1 V2 V1 Yes Yes No
2 │ V3 V1 V1 Yes No Yes
3 │ V2 V3 V3 No Yes Yes
df = DataFrame([["V1", "V3", "V2"],
["V2", "V1", "V3"],
["V1", "V1", "V3"]],
["Col1", "Col2", "Col3"])
Upvotes: 2