logankilpatrick
logankilpatrick

Reputation: 14551

How to join data frames in Julia (Inner, Left, Right, Outer, Semi, Anti, Cross)

I am trying to figure out the different ways I can join (merge) data frames in Julia. How do I do these different types of joins?

Upvotes: 1

Views: 2265

Answers (1)

logankilpatrick
logankilpatrick

Reputation: 14551

From the DataFrames.jl docs:

There are seven kinds of joins supported by the DataFrames package:

Inner: The output contains rows for values of the key that exist in both the first (left) and second (right) arguments to join.

Left: The output contains rows for values of the key that exist in the first (left) argument to join, whether or not that value exists in the second (right) argument.

Right: The output contains rows for values of the key that exist in the second (right) argument to join, whether or not that value exists in the first (left) argument.

Outer: The output contains rows for values of the key that exist in the first (left) or second (right) argument to join.

Semi: Like an inner join, but output is restricted to columns from the first (left) argument to join.

Anti: The output contains rows for values of the key that exist in the first (left) but not the second (right) argument to join. As with semi joins, output is restricted to columns from the first (left) argument.

Cross: The output is the cartesian product of rows from the first (left) and second (right) arguments to join.

julia> people = DataFrame(ID = [20, 40], Name = ["John Doe", "Jane Doe"])
2×2 DataFrame
│ Row │ ID    │ Name     │
│     │ Int64 │ String   │
├─────┼───────┼──────────┤
│ 1   │ 20    │ John Doe │
│ 2   │ 40    │ Jane Doe │

julia> jobs = DataFrame(ID = [20, 60], Job = ["Lawyer", "Astronaut"])
2×2 DataFrame
│ Row │ ID    │ Job       │
│     │ Int64 │ String    │
├─────┼───────┼───────────┤
│ 1   │ 20    │ Lawyer    │
│ 2   │ 60    │ Astronaut │

julia> join(people, jobs, on = :ID, kind = :inner)
1×3 DataFrame
│ Row │ ID    │ Name     │ Job    │
│     │ Int64 │ String   │ String │
├─────┼───────┼──────────┼────────┤
│ 1   │ 20    │ John Doe │ Lawyer │

julia> join(people, jobs, on = :ID, kind = :left)
2×3 DataFrame
│ Row │ ID    │ Name     │ Job     │
│     │ Int64 │ String   │ String⍰ │
├─────┼───────┼──────────┼─────────┤
│ 1   │ 20    │ John Doe │ Lawyer  │
│ 2   │ 40    │ Jane Doe │ missing │

julia> join(people, jobs, on = :ID, kind = :right)
2×3 DataFrame
│ Row │ ID    │ Name     │ Job       │
│     │ Int64 │ String⍰  │ String    │
├─────┼───────┼──────────┼───────────┤
│ 1   │ 20    │ John Doe │ Lawyer    │
│ 2   │ 60    │ missing  │ Astronaut │

julia> join(people, jobs, on = :ID, kind = :outer)
3×3 DataFrame
│ Row │ ID    │ Name     │ Job       │
│     │ Int64 │ String⍰  │ String⍰   │
├─────┼───────┼──────────┼───────────┤
│ 1   │ 20    │ John Doe │ Lawyer    │
│ 2   │ 40    │ Jane Doe │ missing   │
│ 3   │ 60    │ missing  │ Astronaut │

julia> join(people, jobs, on = :ID, kind = :semi)
1×2 DataFrame
│ Row │ ID    │ Name     │
│     │ Int64 │ String   │
├─────┼───────┼──────────┤
│ 1   │ 20    │ John Doe │

julia> join(people, jobs, on = :ID, kind = :anti)
1×2 DataFrame
│ Row │ ID    │ Name     │
│     │ Int64 │ String   │
├─────┼───────┼──────────┤
│ 1   │ 40    │ Jane Doe │

Upvotes: 4

Related Questions