merchmallow
merchmallow

Reputation: 794

First row of a DF as column names

How I do a make this first row as my column names instead of x1, x2....xn?

Thanks

DataFrame

I am able to load the data with XLSX.readtable, but unable to convert it to DataFrame:

dir = expanduser("~/Downloads/")
using DataFrames, XLSX
f = dir * "data.xlsx"
d = XLSX.readtable(f, "dados", "A:C", first_row = 5, header = false)
df = DataFrame(d)

ArgumentError: 'Tuple{Vector{Any}, Vector{Symbol}}' iterates 'Vector{Any}' values, which doesn't satisfy the Tables.jl `AbstractRow` interface

This fails at the last step. The spreadsheet looks like pretty standard data. If I dump the data, I get:

Tuple{Vector{Any}, Vector{Symbol}}
  1: Array{Any}((3,))
    1: Array{Any}((225,))
      1: Dates.Date
        instant: Dates.UTInstant{Dates.Day}
          periods: Dates.Day
            value: Int64 731672
      2: Dates.Date
        instant: Dates.UTInstant{Dates.Day}
          periods: Dates.Day
            value: Int64 731702
      3: Dates.Date
        instant: Dates.UTInstant{Dates.Day}
          periods: Dates.Day
            value: Int64 731733
      4: Dates.Date
        instant: Dates.UTInstant{Dates.Day}
          periods: Dates.Day
            value: Int64 731763
      5: Dates.Date
        instant: Dates.UTInstant{Dates.Day}
          periods: Dates.Day
            value: Int64 731794
      ...
      221: Dates.Date
        instant: Dates.UTInstant{Dates.Day}
          periods: Dates.Day
            value: Int64 738368
      222: Dates.Date
        instant: Dates.UTInstant{Dates.Day}
          periods: Dates.Day
            value: Int64 738399
      223: Dates.Date
        instant: Dates.UTInstant{Dates.Day}
          periods: Dates.Day
            value: Int64 738429
      224: Dates.Date
        instant: Dates.UTInstant{Dates.Day}
          periods: Dates.Day
            value: Int64 738460
      225: Dates.Date
        instant: Dates.UTInstant{Dates.Day}
          periods: Dates.Day
            value: Int64 738490
    2: Array{Any}((225,))
      1: Float64 329.16091
      2: Float64 303.23791
      3: Float64 284.96296
      4: Float64 283.10436
      5: Float64 286.08795
      ...
      221: Missing missing
      222: Missing missing
      223: Missing missing
      224: Missing missing
      225: Missing missing
    3: Array{Any}((225,))
      1: Float64 189.49076
      2: Float64 191.64219
      3: Float64 194.163
      4: Float64 194.49731
      5: Float64 198.85504
      ...
      221: Missing missing
      222: Missing missing
      223: Missing missing
      224: Missing missing
      225: Missing missing
  2: Array{Symbol}((3,))
    1: Symbol A
    2: Symbol B
    3: Symbol C

The spreadsheet may be downloaded at the link I have indicated in the comment section.

Upvotes: 2

Views: 1375

Answers (2)

BallpointBen
BallpointBen

Reputation: 13750

There is XLSX.readtable, which has signature

readtable(filepath, sheet, [columns]; [first_row], <args omitted for brevity>)

You can pass first_row to tell it where the data begins.

Upvotes: 1

Przemyslaw Szufel
Przemyslaw Szufel

Reputation: 42214

You should have read this file using CSV.read which by default processes column names. However having a DataFrame such as this it is still possible to rename columns.

However let's assume you have such data:

julia> df = DataFrame([["a" "b"];[3 7]], :auto)
2×2 DataFrame
 Row │ x1   x2
     │ Any  Any
─────┼──────────
   1 │ a    b
   2 │ 3    7

You can use rename! to assign the column names:

julia> rename!(df, Symbol.(Vector(df[1,:])))[2:end,:]
1×2 DataFrame
 Row │ a    b
     │ Any  Any
─────┼──────────
   1 │ 3    7

Upvotes: 3

Related Questions