Reputation: 1936
Let's create Julia DataFrame
df=convert(DataFrame, rand(10, 4))
It would look like this. I am trying to take the transpose of this dataFrame. "transpose" function appears to be not working for Julia Data Frame as shown below.
I have used Python Pandas dataframe package extensively in the past. In Python, it would be as easy as "df.T" Please let me know a way to Tranpose this dataframe.
Upvotes: 7
Views: 8774
Reputation: 4186
permutedims
does this.
Often when you want to transpose a dataframe, you already have a column with names (Strings or Symbols):
In the asker's random matrix example..
df = DataFrame(rand(10, 4), :auto)
..there aren't any names for the new columns. So we'll use the row numbers:
df.id = string.(1:nrow(df)) # Add column with names
permutedims(df, "id", "")
We used the optional third argument of permutedims
to rename the new id
column to the empty string, which is not necessary but can be nice.
Upvotes: 1
Reputation: 4758
This works with dataframes that are not too complicated. One of the dataframe's columns is used to generate column names. The names of the other columns become row names.
function all_unique(v::Vector)::Bool
return length(unique(v)) == length(v)
end
function df_add_first_column(
df::DataFrame,
colname::Union{Symbol,String},
col_data
)
df1 = DataFrame([colname => col_data])
hcat(df1, df)
end
function df_transpose(df::DataFrame, col::Union{Symbol, String})::DataFrame
@assert all_unique(df[!, col]) "Column `col` contains non-unique elements"
function foo(i)
string(df[i, col]) => collect(df[i, Not(col)])
end
dft = DataFrame(map(foo, 1:nrow(df)))
return df_add_first_column(dft, "Row", filter(x -> x != string(col), names(df)))
end
Example:
df0 = DataFrame(A = [1, 2, 3], B = rand(3), C = rand(3))
3×3 DataFrame
Row │ A B C
│ Int64 Float64 Float64
─────┼───────────────────────────
1 │ 1 0.578605 0.590092
2 │ 2 0.350394 0.399114
3 │ 3 0.90852 0.710629
2×4 DataFrame
Row │ Row 1 2 3
│ String Float64 Float64 Float64
─────┼──────────────────────────────────────
1 │ B 0.578605 0.350394 0.90852
2 │ C 0.590092 0.399114 0.710629
Upvotes: 0
Reputation: 17597
The problem with Stephen answer, is that order of columns is not preserved (try if you are not convinced with the following DataFrame
julia> df = DataFrame(A = 1:4, B = 5:8, AA = 15:18)
4×3 DataFrame
│ Row │ A │ B │ AA │
│ │ Int64 │ Int64 │ Int64 │
├─────┼───────┼───────┼───────┤
│ 1 │ 1 │ 5 │ 15 │
│ 2 │ 2 │ 6 │ 16 │
│ 3 │ 3 │ 7 │ 17 │
│ 4 │ 4 │ 8 │ 18 │
but this DataFrame
can be transposed (keeping order of columns/rows) using:
julia> DataFrame([[names(df)]; collect.(eachrow(df))], [:column; Symbol.(axes(df, 1))])
3×5 DataFrame
│ Row │ column │ 1 │ 2 │ 3 │ 4 │
│ │ Symbol │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼────────┼───────┼───────┼───────┼───────┤
│ 1 │ A │ 1 │ 2 │ 3 │ 4 │
│ 2 │ B │ 5 │ 6 │ 7 │ 8 │
│ 3 │ AA │ 15 │ 16 │ 17 │ 18 │
Reference: https://github.com/JuliaData/DataFrames.jl/issues/2065#issuecomment-568937464
Upvotes: 10
Reputation: 106
I had the same question and tried the strategy suggested in the comments to your question. The problem I encountered, however, is that converting to a Matrix
won't work if your DataFrame has NA
values. You have to change them to something else, then convert to a Matrix
. I had a lot of problems converting back to NA
when I wanted to get from a Matrix
back to a DataFrame
type.
Here's a way to do it using DataFrame
's stack
and unstack
functions.
julia> using DataFrames
julia> df = DataFrame(A = 1:4, B = 5:8)
4×2 DataFrame
│ Row │ A │ B │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 5 │
│ 2 │ 2 │ 6 │
│ 3 │ 3 │ 7 │
│ 4 │ 4 │ 8 │
julia> colnames = names(df)
2-element Array{Symbol,1}:
:A
:B
julia> df[!, :id] = 1:size(df, 1)
1:4
julia> df
4×3 DataFrame
│ Row │ A │ B │ id │
│ │ Int64 │ Int64 │ Int64 │
├─────┼───────┼───────┼───────┤
│ 1 │ 1 │ 5 │ 1 │
│ 2 │ 2 │ 6 │ 2 │
│ 3 │ 3 │ 7 │ 3 │
│ 4 │ 4 │ 8 │ 4 │
Adding the :id
column is suggested by the DataFrame
documentation as a way to help with unstacking.
Now stack the columns you want to transpose:
julia> dfl = stack(df, colnames)
8×3 DataFrame
│ Row │ variable │ value │ id │
│ │ Symbol │ Int64 │ Int64 │
├─────┼──────────┼───────┼───────┤
│ 1 │ A │ 1 │ 1 │
│ 2 │ A │ 2 │ 2 │
│ 3 │ A │ 3 │ 3 │
│ 4 │ A │ 4 │ 4 │
│ 5 │ B │ 5 │ 1 │
│ 6 │ B │ 6 │ 2 │
│ 7 │ B │ 7 │ 3 │
│ 8 │ B │ 8 │ 4 │
Then unstack, switching the id and variable names (this is why adding the :id
column is necessary).
julia> dfnew = unstack(dfl, :variable, :id, :value)
2×5 DataFrame
│ Row │ variable │ 1 │ 2 │ 3 │ 4 │
│ │ Symbol │ Int64⍰ │ Int64⍰ │ Int64⍰ │ Int64⍰ │
├─────┼──────────┼────────┼────────┼────────┼────────┤
│ 1 │ A │ 1 │ 2 │ 3 │ 4 │
│ 2 │ B │ 5 │ 6 │ 7 │ 8 │
Upvotes: 9