Kevin L. Keys
Kevin L. Keys

Reputation: 995

Vectorized join of two or more columns of a DataFrame in Julia

I have a Julia DataFrame with several String and Int columns. I want to glue them together horizontally in vectorized fashion to produce one column. In R, I would use paste. Is this possible in Julia?

The desired output is not that of hcat or vcat operations such as these. The goal is to make a single new column of strings with rows "x1[i]:x2[i]", where x1[i] and x2[i] are corresponding row elements from the columns x1 and x2 of the DataFrame object.

Julia Example:

# tested in Julia v0.5.0 and v0.6.2
# example data frame
y = DataFrame(x1 = [1,2,3], x2 = ["A","B","C"])

# goal: make column ["1:A"; "2:B", "3:C"]
# desired output format for one row
join( [ y[1,:x1], y[1,:x2] ], ":" ) # > "1:A"

# doesn't work with vectors, makes one long string
# (0.5) > "[1,2,3]:String[\"A\",\"B\",\"C\"]"
# (0.6) > "Any[1, 2, 3]:Any[\"A\", \"B\", \"C\"]"
join([y[:,:x1], y[:,:x2]], ":")

# default broadcast operation doesn't work either
# (0.5) > ERROR: MethodError: no method matching size(::String)
# (0.6) > 2-element Array{String,1}:
#           "1:2:3"
#           "A:B:C"
join.([y[:,:x1], y[:,:x2]], ":")

R Example

# same data structure as before
y = data.frame(x1 = c(1:3), x2 = c("A", "B", "C"))

# desired output format with 'paste'
paste(y$x1, y$x2, sep = ":") # > "1:A" "2:B" "3:C"

Upvotes: 3

Views: 1227

Answers (1)

Dan Getz
Dan Getz

Reputation: 18217

Possible alternatives are:

  1. ["$(r[:x1]):$(r[:x2])" for r in eachrow(y)]

  2. [join(Array(r),":") for r in eachrow(y)]

  3. mapslices(x->join(x,":"),(Array(y)),2)

  4. map(x->join(x,":"),zip(y[:x1],string.(y[:x2])))

  5. [string(y[:x1][i])*":"*string(y[:x2][i]) for i=1:nrow(y)]

They are not all equal in terms of performance (option 5 is fastest but more specific).

Upvotes: 3

Related Questions