xiaodai
xiaodai

Reputation: 16004

Julia: How to create multiple columns with one function in DataFrames.jl

Say I have a column

using DataFrames

df = DataFrame(var = "methodA_mean")
1×3 DataFrame
│ Row │ var          │
│     │ String       │
├─────┼──────────────┼
│ 1   │ methodA_mean │
│ 2   │ methodB_var  │
│ 3   │ methodA_var  │

and I would like to create two new columns by extracting A and mean var like so

3×3 DataFrame
│ Row │ var          │ ab     │ stat   │
│     │ String       │ String │ String │
├─────┼──────────────┼────────┼────────┤
│ 1   │ methodA_mean │ A      │ mean   │
│ 2   │ methodB_var  │ B      │ var    │
│ 3   │ methodA_var  │ A      │ var    │

I can write a regex extract "A" or "B" and "mean" and "var" from the var column. But how I output into multiple columns elegantly?

I tried the below and it works, but I feel there should more elegant way to create multiple columns

tmp = match.(r"method(?<ab>A|B)_(?<stat>mean|var)", df.var)

df.ab = getindex.(tmp, :ab)
df.stat = getindex.(tmp, :st)
3×3 DataFrame
│ Row │ var          │ ab       │ stat     │
│     │ String       │ SubStri… │ SubStri… │
├─────┼──────────────┼──────────┼──────────┤
│ 1   │ methodA_mean │ A        │ mean     │
│ 2   │ methodB_var  │ B        │ var      │
│ 3   │ methodA_var  │ A        │ var      │

Upvotes: 2

Views: 765

Answers (2)

mentics
mentics

Reputation: 6999

The DataFrames functions select and transform can handle multiple column outputs by returning a Matrix from the function. For your example:

df = DataFrame(var = ["methodA_mean", "methodB_var", "methodA_var"])

function extract(xs)
  matches = match.(r"method(?<ab>A|B)_(?<stat>mean|var)", xs)
  return [getindex(m, i) for m in matches, i in (:ab, :stat)]
  # returns a 3x2 Matrix
end

df2 = transform(df, :var => extract => [:ab, :stat])

The transform function in the final line takes the var column, sends it to the extract function (as a Vector) which then returns a Matrix. The columns of that Matrix are added to the DataFrame with the names specified after the second =>.

Upvotes: 1

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69819

I am not sure in which part for your code you are looking for an improvement as it seems normal and OK to me, but you could write e.g. this:

julia> insertcols!(df, :ab => last.(first.(df.var, 7), 1), :stat => chop.(df.var, head=8, tail=0))
3×3 DataFrame
│ Row │ var          │ ab     │ stat     │
│     │ String       │ String │ SubStri… │
├─────┼──────────────┼────────┼──────────┤
│ 1   │ methodA_mean │ A      │ mean     │
│ 2   │ methodB_var  │ B      │ var      │
│ 3   │ methodA_var  │ A      │ var      │

Upvotes: 2

Related Questions