Alec
Alec

Reputation: 4482

In Julia, how can I combine multiple dataframes if some columns are different?

I have unordered dataframes with similar, but not quite equal column keys. E.g.:

DataFrame.   Columns
Dataframe 1: A, B, C
Dataframe 2: A, B, C
Dataframe 3: A, B, C, D
Dataframe 4: A, C, D

I would like to have them stacked / concatenated / appended. I don't care how the missing data is filled in for dataframes missing a given column.

That is, I'd like a single dataframe:

DataFrame combined: A, B, C, D

Upvotes: 4

Views: 2199

Answers (2)

Krish
Krish

Reputation: 1081

A little late, but I found this post which suggested outerjoin.

Lets say you have two DataFrames that looks like this:

4×2 DataFrame
│ Row │ item           │ sku   │
│     │ String         │ Int64 │
├─────┼────────────────┼───────┤
│ 1   │ Mars Rover     │ 34566 │
│ 2   │ Venus Explorer │ 78945 │
│ 3   │ Lunar Rover    │ 15179 │
│ 4   │ 30% Sun Filter │ 77254 │

4×3 DataFrame
 Row │ item            id     kind
     │ String          Int64  String
─────┼───────────────────────────────────
   1 │ Mars Rover        100  Rover
   2 │ Venus Explorer    101  Spaceship
   3 │ Lunar Rover       102  Rover
   4 │ 30% Sun Filter    103  Sun Filter

You can join these dataframes with outerjoin.

inventory_sku = outerjoin(inventory, sku, on = :item)

We get:

4×4 DataFrame
 Row │ item            id     kind        sku
     │ String          Int64  String      Int64
─────┼──────────────────────────────────────────
   1 │ Mars Rover        100  Rover       34566
   2 │ Venus Explorer    101  Spaceship   78945
   3 │ Lunar Rover       102  Rover       15179
   4 │ 30% Sun Filter    103  Sun Filter  77254

This is subtly different from what OP wants, but might be useful to others and it's a much cleaner IMHO.

Upvotes: 0

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69949

If you want vectical concatenation do:

julia> dfs = [DataFrame(permutedims(1:n), :auto) for n in 1:5]
5-element Vector{DataFrame}:
 1×1 DataFrame
 Row │ x1
     │ Int64
─────┼───────
   1 │     1
 1×2 DataFrame
 Row │ x1     x2
     │ Int64  Int64
─────┼──────────────
   1 │     1      2
 1×3 DataFrame
 Row │ x1     x2     x3
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      2      3
 1×4 DataFrame
 Row │ x1     x2     x3     x4
     │ Int64  Int64  Int64  Int64
─────┼────────────────────────────
   1 │     1      2      3      4
 1×5 DataFrame
 Row │ x1     x2     x3     x4     x5
     │ Int64  Int64  Int64  Int64  Int64
─────┼───────────────────────────────────
   1 │     1      2      3      4      5

julia> vcat(dfs[1], dfs[2], dfs[3], dfs[4], dfs[5], cols=:union)
5×5 DataFrame
 Row │ x1     x2       x3       x4       x5
     │ Int64  Int64?   Int64?   Int64?   Int64?
─────┼───────────────────────────────────────────
   1 │     1  missing  missing  missing  missing
   2 │     1        2  missing  missing  missing
   3 │     1        2        3  missing  missing
   4 │     1        2        3        4  missing
   5 │     1        2        3        4        5

If you want to append do:

julia> dfs = [DataFrame(permutedims(1:n), :auto) for n in 1:5]
5-element Vector{DataFrame}:
 1×1 DataFrame
 Row │ x1
     │ Int64
─────┼───────
   1 │     1
 1×2 DataFrame
 Row │ x1     x2
     │ Int64  Int64
─────┼──────────────
   1 │     1      2
 1×3 DataFrame
 Row │ x1     x2     x3
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      2      3
 1×4 DataFrame
 Row │ x1     x2     x3     x4
     │ Int64  Int64  Int64  Int64
─────┼────────────────────────────
   1 │     1      2      3      4
 1×5 DataFrame
 Row │ x1     x2     x3     x4     x5
     │ Int64  Int64  Int64  Int64  Int64
─────┼───────────────────────────────────
   1 │     1      2      3      4      5

julia> append!(dfs[1], dfs[2], cols=:union)
2×2 DataFrame
 Row │ x1     x2
     │ Int64  Int64?
─────┼────────────────
   1 │     1  missing
   2 │     1        2

julia> append!(dfs[1], dfs[3], cols=:union)
3×3 DataFrame
 Row │ x1     x2       x3
     │ Int64  Int64?   Int64?
─────┼─────────────────────────
   1 │     1  missing  missing
   2 │     1        2  missing
   3 │     1        2        3

julia> append!(dfs[1], dfs[4], cols=:union)
4×4 DataFrame
 Row │ x1     x2       x3       x4
     │ Int64  Int64?   Int64?   Int64?
─────┼──────────────────────────────────
   1 │     1  missing  missing  missing
   2 │     1        2  missing  missing
   3 │     1        2        3  missing
   4 │     1        2        3        4

julia> append!(dfs[1], dfs[5], cols=:union)
5×5 DataFrame
 Row │ x1     x2       x3       x4       x5
     │ Int64  Int64?   Int64?   Int64?   Int64?
─────┼───────────────────────────────────────────
   1 │     1  missing  missing  missing  missing
   2 │     1        2  missing  missing  missing
   3 │     1        2        3  missing  missing
   4 │     1        2        3        4  missing
   5 │     1        2        3        4        5

julia> dfs[1]
5×5 DataFrame
 Row │ x1     x2       x3       x4       x5
     │ Int64  Int64?   Int64?   Int64?   Int64?
─────┼───────────────────────────────────────────
   1 │     1  missing  missing  missing  missing
   2 │     1        2  missing  missing  missing
   3 │     1        2        3  missing  missing
   4 │     1        2        3        4  missing
   5 │     1        2        3        4        5

Upvotes: 6

Related Questions