Reputation: 1936

Transpose of Julia DataFrame

Let's create Julia DataFrame

df=convert(DataFrame, rand(10, 4))

It would look like this. I am trying to take the transpose of this dataFrame. "transpose" function appears to be not working for Julia Data Frame as shown below.

I have used Python Pandas dataframe package extensively in the past. In Python, it would be as easy as "df.T" Please let me know a way to Tranpose this dataframe.

Upvotes: 7

Answers (4)

Tomas

Reputation: 4246

permutedims does this.

Often when you want to transpose a dataframe, you already have a column with names (Strings or Symbols):

In the asker's random matrix example..

df = DataFrame(rand(10, 4), :auto)

..there aren't any names for the new columns. So we'll use the row numbers:

df.id = string.(1:nrow(df))  # Add column with names
permutedims(df, "id", "")

We used the optional third argument of permutedims to rename the new id column to the empty string, which is not necessary but can be nice.

Upvotes: 2

Soldalma

Reputation: 4758

This works with dataframes that are not too complicated. One of the dataframe's columns is used to generate column names. The names of the other columns become row names.

function all_unique(v::Vector)::Bool
    return length(unique(v)) == length(v)
end

function df_add_first_column(
    df::DataFrame,
    colname::Union{Symbol,String},
    col_data
)
    df1 = DataFrame([colname => col_data])
    hcat(df1, df)
end

function df_transpose(df::DataFrame, col::Union{Symbol, String})::DataFrame
    @assert all_unique(df[!, col]) "Column `col` contains non-unique elements"

    function foo(i)
        string(df[i, col]) => collect(df[i, Not(col)])
    end

    dft = DataFrame(map(foo, 1:nrow(df)))

    return df_add_first_column(dft, "Row", filter(x -> x != string(col), names(df)))
end

Example:

df0 = DataFrame(A = [1, 2, 3], B = rand(3), C = rand(3))

    3×3 DataFrame
 Row │ A      B         C        
     │ Int64  Float64   Float64
─────┼───────────────────────────
   1 │     1  0.578605  0.590092
   2 │     2  0.350394  0.399114
   3 │     3  0.90852   0.710629

2×4 DataFrame
 Row │ Row     1         2         3        
     │ String  Float64   Float64   Float64
─────┼──────────────────────────────────────
   1 │ B       0.578605  0.350394  0.90852
   2 │ C       0.590092  0.399114  0.710629

Upvotes: 0

scls

Reputation: 17647

The problem with Stephen answer, is that order of columns is not preserved (try if you are not convinced with the following DataFrame

julia> df = DataFrame(A = 1:4, B = 5:8, AA = 15:18)
4×3 DataFrame
│ Row │ A     │ B     │ AA    │
│     │ Int64 │ Int64 │ Int64 │
├─────┼───────┼───────┼───────┤
│ 1   │ 1     │ 5     │ 15    │
│ 2   │ 2     │ 6     │ 16    │
│ 3   │ 3     │ 7     │ 17    │
│ 4   │ 4     │ 8     │ 18    │

but this DataFrame can be transposed (keeping order of columns/rows) using:

julia> DataFrame([[names(df)]; collect.(eachrow(df))], [:column; Symbol.(axes(df, 1))])
3×5 DataFrame
│ Row │ column │ 1     │ 2     │ 3     │ 4     │
│     │ Symbol │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼────────┼───────┼───────┼───────┼───────┤
│ 1   │ A      │ 1     │ 2     │ 3     │ 4     │
│ 2   │ B      │ 5     │ 6     │ 7     │ 8     │
│ 3   │ AA     │ 15    │ 16    │ 17    │ 18    │

Reference: https://github.com/JuliaData/DataFrames.jl/issues/2065#issuecomment-568937464

Upvotes: 10

Stephen Nicar

Reputation: 106

I had the same question and tried the strategy suggested in the comments to your question. The problem I encountered, however, is that converting to a Matrix won't work if your DataFrame has NA values. You have to change them to something else, then convert to a Matrix. I had a lot of problems converting back to NA when I wanted to get from a Matrix back to a DataFrame type.

Here's a way to do it using DataFrame's stack and unstack functions.

julia> using DataFrames

julia> df = DataFrame(A = 1:4, B = 5:8)
4×2 DataFrame
│ Row │ A     │ B     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 5     │
│ 2   │ 2     │ 6     │
│ 3   │ 3     │ 7     │
│ 4   │ 4     │ 8     │

julia> colnames = names(df)
2-element Array{Symbol,1}:
 :A
 :B

julia> df[!, :id] = 1:size(df, 1)
1:4

julia> df
4×3 DataFrame
│ Row │ A     │ B     │ id    │
│     │ Int64 │ Int64 │ Int64 │
├─────┼───────┼───────┼───────┤
│ 1   │ 1     │ 5     │ 1     │
│ 2   │ 2     │ 6     │ 2     │
│ 3   │ 3     │ 7     │ 3     │
│ 4   │ 4     │ 8     │ 4     │

Adding the :id column is suggested by the DataFrame documentation as a way to help with unstacking.

Now stack the columns you want to transpose:

julia> dfl = stack(df, colnames)
8×3 DataFrame
│ Row │ variable │ value │ id    │
│     │ Symbol   │ Int64 │ Int64 │
├─────┼──────────┼───────┼───────┤
│ 1   │ A        │ 1     │ 1     │
│ 2   │ A        │ 2     │ 2     │
│ 3   │ A        │ 3     │ 3     │
│ 4   │ A        │ 4     │ 4     │
│ 5   │ B        │ 5     │ 1     │
│ 6   │ B        │ 6     │ 2     │
│ 7   │ B        │ 7     │ 3     │
│ 8   │ B        │ 8     │ 4     │

Then unstack, switching the id and variable names (this is why adding the :id column is necessary).

julia> dfnew = unstack(dfl, :variable, :id, :value)
2×5 DataFrame
│ Row │ variable │ 1      │ 2      │ 3      │ 4      │
│     │ Symbol   │ Int64⍰ │ Int64⍰ │ Int64⍰ │ Int64⍰ │
├─────┼──────────┼────────┼────────┼────────┼────────┤
│ 1   │ A        │ 1      │ 2      │ 3      │ 4      │
│ 2   │ B        │ 5      │ 6      │ 7      │ 8      │

Upvotes: 9

Transpose of Julia DataFrame

Answers (4)

Related Questions