Reputation: 16004
Say I have a column
using DataFrames
df = DataFrame(var = "methodA_mean")
1×3 DataFrame
│ Row │ var │
│ │ String │
├─────┼──────────────┼
│ 1 │ methodA_mean │
│ 2 │ methodB_var │
│ 3 │ methodA_var │
and I would like to create two new columns by extracting A and mean var like so
3×3 DataFrame
│ Row │ var │ ab │ stat │
│ │ String │ String │ String │
├─────┼──────────────┼────────┼────────┤
│ 1 │ methodA_mean │ A │ mean │
│ 2 │ methodB_var │ B │ var │
│ 3 │ methodA_var │ A │ var │
I can write a regex extract "A" or "B" and "mean" and "var" from the var
column. But how I output into multiple columns elegantly?
I tried the below and it works, but I feel there should more elegant way to create multiple columns
tmp = match.(r"method(?<ab>A|B)_(?<stat>mean|var)", df.var)
df.ab = getindex.(tmp, :ab)
df.stat = getindex.(tmp, :st)
3×3 DataFrame
│ Row │ var │ ab │ stat │
│ │ String │ SubStri… │ SubStri… │
├─────┼──────────────┼──────────┼──────────┤
│ 1 │ methodA_mean │ A │ mean │
│ 2 │ methodB_var │ B │ var │
│ 3 │ methodA_var │ A │ var │
Upvotes: 2
Views: 765
Reputation: 6999
The DataFrames
functions select
and transform
can handle multiple column outputs by returning a Matrix
from the function. For your example:
df = DataFrame(var = ["methodA_mean", "methodB_var", "methodA_var"])
function extract(xs)
matches = match.(r"method(?<ab>A|B)_(?<stat>mean|var)", xs)
return [getindex(m, i) for m in matches, i in (:ab, :stat)]
# returns a 3x2 Matrix
end
df2 = transform(df, :var => extract => [:ab, :stat])
The transform function in the final line takes the var
column, sends it to the extract
function (as a Vector
) which then returns a Matrix
. The columns of that Matrix
are added to the DataFrame
with the names specified after the second =>
.
Upvotes: 1
Reputation: 69819
I am not sure in which part for your code you are looking for an improvement as it seems normal and OK to me, but you could write e.g. this:
julia> insertcols!(df, :ab => last.(first.(df.var, 7), 1), :stat => chop.(df.var, head=8, tail=0))
3×3 DataFrame
│ Row │ var │ ab │ stat │
│ │ String │ String │ SubStri… │
├─────┼──────────────┼────────┼──────────┤
│ 1 │ methodA_mean │ A │ mean │
│ 2 │ methodB_var │ B │ var │
│ 3 │ methodA_var │ A │ var │
Upvotes: 2