AVA
AVA

Reputation: 2558

How to create dataframe from DelimitedFiles.readdlm() object?

I am trying to create DataFrame as follows:

[root@srvr0 ~]# julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.4.1 (2020-04-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |
    
julia> using DataFrames

julia> using DelimitedFiles
    
julia> P,H = readdlm("programminglanguages.csv",',';header=true);

julia> P
73×2 Array{Any,2}:
 1951  "Regional Assembly Language"
 1952  "Autocode"
 1954  "IPL"
 1955  "FLOW-MATIC"
 1957  "FORTRAN"
 1957  "COMTRAN"
 1958  "LISP"
 1958  "ALGOL 58"
 1959  "FACT"
 1959  "COBOL"
 1959  "RPG"
 1962  "APL"
 1962  "Simula"
 1962  "SNOBOL"
 1963  "CPL"
 1964  "Speakeasy"
 1964  "BASIC"
 1964  "PL/I"
 1966  "JOSS"
 1967  "BCPL"
 1968  "Logo"
 1969  "B"
 1970  "Pascal"
 1970  "Forth"
    ⋮  
 1995  "Ada 95"
 1995  "Java"
 1995  "Delphi "
 1995  "JavaScript"
 1995  "PHP"
 1997  "Rebol"
 2000  "ActionScript"
 2001  "C#"
 2001  "D"
 2002  "Scratch"
 2003  "Groovy"
 2003  "Scala"
 2005  "F#"
 2006  "PowerShell"
 2007  "Clojure"
 2009  "Go"
 2010  "Rust"
 2011  "Dart"
 2011  "Kotlin"
 2011  "Red"
 2011  "Elixir"
 2012  "Julia"
 2014  "Swift"

julia> H
1×2 Array{AbstractString,2}:
 "year"  "language"

julia> typeof(P)
Array{Any,2}

julia> typeof(H)
Array{AbstractString,2}

julia> vec(H)
2-element Array{AbstractString,1}:
 "year"
 "language"

julia> typeof(vec(H))
Array{AbstractString,1}

julia> DataFrame(P, H)

But I am getting the following error:

ERROR: MethodError: no method matching DataFrame(::Array{Any,2}, ::Array{AbstractString,2})
Closest candidates are:
  DataFrame(::AbstractArray{T,2} where T) at /opt/julia/julia-1.4.1/share/julia/stdlib/v1.4/packages/DataFrames/yH0f6/src/dataframe/dataframe.jl:209
  DataFrame(::AbstractArray{T,2} where T, ::AbstractArray{Symbol,1}; makeunique) at /opt/julia/julia-1.4.1/share/julia/stdlib/v1.4/packages/DataFrames/yH0f6/src/dataframe/dataframe.jl:209
  DataFrame(::T; copycols) where T at /opt/julia/julia-1.4.1/share/julia/stdlib/v1.4/packages/DataFrames/yH0f6/src/other/tables.jl:23
Stacktrace:
 [1] top-level scope at REPL[10]:1

Update1: with reference to Dr.Bogumils Solution:

julia> DataFrame(P, vec(H))
ERROR: MethodError: no method matching DataFrame(::Array{Any,2}, ::Array{AbstractString,1})
Closest candidates are:
  DataFrame(::AbstractArray{T,2} where T) at /opt/julia/julia-1.4.1/share/julia/stdlib/v1.4/packages/DataFrames/yH0f6/src/dataframe/dataframe.jl:209
  DataFrame(::AbstractArray{T,2} where T, ::AbstractArray{Symbol,1}; makeunique) at /opt/julia/julia-1.4.1/share/julia/stdlib/v1.4/packages/DataFrames/yH0f6/src/dataframe/dataframe.jl:209
  DataFrame(::T; copycols) where T at /opt/julia/julia-1.4.1/share/julia/stdlib/v1.4/packages/DataFrames/yH0f6/src/other/tables.jl:23
Stacktrace:
 [1] top-level scope at REPL[13]:1

julia> 

Please guide me creating Datafrome with header from readdlm object.

Update2:

I got it in trial and error method:

julia> df1=DataFrame(P, Symbol.(vec(H)))
73×2 DataFrame
│ Row │ year │ language                   │
│     │ Any  │ Any                        │
├─────┼──────┼────────────────────────────┤
│ 1   │ 1951 │ Regional Assembly Language │
│ 2   │ 1952 │ Autocode                   │
│ 3   │ 1954 │ IPL                        │
│ 4   │ 1955 │ FLOW-MATIC                 │
│ 5   │ 1957 │ FORTRAN                    │
│ 6   │ 1957 │ COMTRAN                    │
│ 7   │ 1958 │ LISP                       │
│ 8   │ 1958 │ ALGOL 58                   │
│ 9   │ 1959 │ FACT                       │
│ 10  │ 1959 │ COBOL                      │
│ 11  │ 1959 │ RPG                        │
│ 12  │ 1962 │ APL                        │
│ 13  │ 1962 │ Simula                     │
│ 14  │ 1962 │ SNOBOL                     │
│ 15  │ 1963 │ CPL                        │
│ 16  │ 1964 │ Speakeasy                  │
│ 17  │ 1964 │ BASIC                      │
│ 18  │ 1964 │ PL/I                       │
│ 19  │ 1966 │ JOSS                       │
│ 20  │ 1967 │ BCPL                       │
│ 21  │ 1968 │ Logo                       │
⋮
│ 52  │ 1995 │ Java                       │
│ 53  │ 1995 │ Delphi                     │
│ 54  │ 1995 │ JavaScript                 │
│ 55  │ 1995 │ PHP                        │
│ 56  │ 1997 │ Rebol                      │
│ 57  │ 2000 │ ActionScript               │
│ 58  │ 2001 │ C#                         │
│ 59  │ 2001 │ D                          │
│ 60  │ 2002 │ Scratch                    │
│ 61  │ 2003 │ Groovy                     │
│ 62  │ 2003 │ Scala                      │
│ 63  │ 2005 │ F#                         │
│ 64  │ 2006 │ PowerShell                 │
│ 65  │ 2007 │ Clojure                    │
│ 66  │ 2009 │ Go                         │
│ 67  │ 2010 │ Rust                       │
│ 68  │ 2011 │ Dart                       │
│ 69  │ 2011 │ Kotlin                     │
│ 70  │ 2011 │ Red                        │
│ 71  │ 2011 │ Elixir                     │
│ 72  │ 2012 │ Julia                      │
│ 73  │ 2014 │ Swift                      │

Upvotes: 1

Views: 527

Answers (1)

Nils Gudat
Nils Gudat

Reputation: 13800

This is hard to answer exactly, but the error just tells you that you can't pass two matrices to the DataFrame constructor.

The possible constructors for DataFrame can be found in the docs here. The one that seems closest to what you might want is probably

DataFrame(columns::AbstractVecOrMat, names::Union{AbstractVector, Symbol};
          makeunique::Bool=false, copycols::Bool=true)

adapted to your use case (I'm creating a random P and a simple vector H with column names here as of course I don't have your data):

julia> P = Any[rand() for i ∈ 1:3, j ∈ 1:3]
3×3 Matrix{Any}:
 0.0413352  0.41672   0.266163
 0.487072   0.308392  0.810582
 0.470833   0.459017  0.165082

julia> H = string.('a':'c')
3-element Vector{String}:
 "a"
 "b"
 "c"

julia> DataFrame(P, H)
3×3 DataFrame
 Row │ a          b         c        
     │ Any        Any       Any      
─────┼───────────────────────────────
   1 │ 0.0413352  0.41672   0.266163
   2 │ 0.487072   0.308392  0.810582
   3 │ 0.470833   0.459017  0.165082

EDIT: I should have also recommended just using the excellent CSV package - the issue you're facing, as Bogumil points out in the comments, is that readdlm puts headers in a Matrix. With CSV you could have just done:

using CSV, DataFrames

df = CSV.read("programminglanguages.csv", DataFrame)

Upvotes: 1

Related Questions