Reputation: 4482
I have unordered dataframes with similar, but not quite equal column keys. E.g.:
DataFrame. Columns
Dataframe 1: A, B, C
Dataframe 2: A, B, C
Dataframe 3: A, B, C, D
Dataframe 4: A, C, D
I would like to have them stacked / concatenated / appended. I don't care how the missing data is filled in for dataframes missing a given column.
That is, I'd like a single dataframe:
DataFrame combined: A, B, C, D
Upvotes: 4
Views: 2199
Reputation: 1081
A little late, but I found this post which suggested outerjoin
.
Lets say you have two DataFrames that looks like this:
4×2 DataFrame
│ Row │ item │ sku │
│ │ String │ Int64 │
├─────┼────────────────┼───────┤
│ 1 │ Mars Rover │ 34566 │
│ 2 │ Venus Explorer │ 78945 │
│ 3 │ Lunar Rover │ 15179 │
│ 4 │ 30% Sun Filter │ 77254 │
4×3 DataFrame
Row │ item id kind
│ String Int64 String
─────┼───────────────────────────────────
1 │ Mars Rover 100 Rover
2 │ Venus Explorer 101 Spaceship
3 │ Lunar Rover 102 Rover
4 │ 30% Sun Filter 103 Sun Filter
You can join these dataframes with outerjoin
.
inventory_sku = outerjoin(inventory, sku, on = :item)
We get:
4×4 DataFrame
Row │ item id kind sku
│ String Int64 String Int64
─────┼──────────────────────────────────────────
1 │ Mars Rover 100 Rover 34566
2 │ Venus Explorer 101 Spaceship 78945
3 │ Lunar Rover 102 Rover 15179
4 │ 30% Sun Filter 103 Sun Filter 77254
This is subtly different from what OP wants, but might be useful to others and it's a much cleaner IMHO.
Upvotes: 0
Reputation: 69949
If you want vectical concatenation do:
julia> dfs = [DataFrame(permutedims(1:n), :auto) for n in 1:5]
5-element Vector{DataFrame}:
1×1 DataFrame
Row │ x1
│ Int64
─────┼───────
1 │ 1
1×2 DataFrame
Row │ x1 x2
│ Int64 Int64
─────┼──────────────
1 │ 1 2
1×3 DataFrame
Row │ x1 x2 x3
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 2 3
1×4 DataFrame
Row │ x1 x2 x3 x4
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 1 2 3 4
1×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ Int64 Int64 Int64 Int64 Int64
─────┼───────────────────────────────────
1 │ 1 2 3 4 5
julia> vcat(dfs[1], dfs[2], dfs[3], dfs[4], dfs[5], cols=:union)
5×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ Int64 Int64? Int64? Int64? Int64?
─────┼───────────────────────────────────────────
1 │ 1 missing missing missing missing
2 │ 1 2 missing missing missing
3 │ 1 2 3 missing missing
4 │ 1 2 3 4 missing
5 │ 1 2 3 4 5
If you want to append do:
julia> dfs = [DataFrame(permutedims(1:n), :auto) for n in 1:5]
5-element Vector{DataFrame}:
1×1 DataFrame
Row │ x1
│ Int64
─────┼───────
1 │ 1
1×2 DataFrame
Row │ x1 x2
│ Int64 Int64
─────┼──────────────
1 │ 1 2
1×3 DataFrame
Row │ x1 x2 x3
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 2 3
1×4 DataFrame
Row │ x1 x2 x3 x4
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 1 2 3 4
1×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ Int64 Int64 Int64 Int64 Int64
─────┼───────────────────────────────────
1 │ 1 2 3 4 5
julia> append!(dfs[1], dfs[2], cols=:union)
2×2 DataFrame
Row │ x1 x2
│ Int64 Int64?
─────┼────────────────
1 │ 1 missing
2 │ 1 2
julia> append!(dfs[1], dfs[3], cols=:union)
3×3 DataFrame
Row │ x1 x2 x3
│ Int64 Int64? Int64?
─────┼─────────────────────────
1 │ 1 missing missing
2 │ 1 2 missing
3 │ 1 2 3
julia> append!(dfs[1], dfs[4], cols=:union)
4×4 DataFrame
Row │ x1 x2 x3 x4
│ Int64 Int64? Int64? Int64?
─────┼──────────────────────────────────
1 │ 1 missing missing missing
2 │ 1 2 missing missing
3 │ 1 2 3 missing
4 │ 1 2 3 4
julia> append!(dfs[1], dfs[5], cols=:union)
5×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ Int64 Int64? Int64? Int64? Int64?
─────┼───────────────────────────────────────────
1 │ 1 missing missing missing missing
2 │ 1 2 missing missing missing
3 │ 1 2 3 missing missing
4 │ 1 2 3 4 missing
5 │ 1 2 3 4 5
julia> dfs[1]
5×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ Int64 Int64? Int64? Int64? Int64?
─────┼───────────────────────────────────────────
1 │ 1 missing missing missing missing
2 │ 1 2 missing missing missing
3 │ 1 2 3 missing missing
4 │ 1 2 3 4 missing
5 │ 1 2 3 4 5
Upvotes: 6