imantha
imantha

Reputation: 3838

Joining Multiple Data Frames

I am wondering if there is a way in Julia DataFrames to join multiple data frames in one go,

 using DataFrames

 employer = DataFrame(
    ID = Array{Int64}([01,02,03,04,05,09,11,20]),
    name = Array{String}(["Matthews","Daniella", "Kofi", "Vladmir", "Jean", "James", "Ayo", "Bill"])
    )

salary = DataFrame(
    ID = Array{Int64}([01,02,03,04,05,06,08,23]),
    amount = Array{Int64}([2050,3000,3500,3500,2500,3400,2700,4500])
)

hours = DataFrame(
    ID = Array{Int64}([01,02,03,04,05,08,09,23]),
    time = Array{Int64}([40,40,40,40,40,38,45,50])
)

# I tried adding them in an array but ofcoures that results in an error
empSalHrs = innerjoin([employer,salary,hours], on = :ID)

# In python you can achieve this using
import pandas as pd 
from functools import reduce

df = reduce(lambda l,r : pd.merge(l,r, on = "ID"), [employer, salary, hours])

Is there a similar way to do this in julia?

Upvotes: 4

Views: 100

Answers (1)

Andrej Oskin
Andrej Oskin

Reputation: 2342

You were almost there. As it is written in DataFrames.jl manual you just need to pass more than one dataframe as an argument.

using DataFrames

 employer = DataFrame(
    ID = [01,02,03,04,05,09,11,20],
    name = ["Matthews","Daniella", "Kofi", "Vladmir", "Jean", "James", "Ayo", "Bill"])
    

salary = DataFrame(
    ID = [01,02,03,04,05,06,08,23],
    amount = [2050,3000,3500,3500,2500,3400,2700,4500])


hours = DataFrame(
    ID = [01,02,03,04,05,08,09,23],
    time = [40,40,40,40,40,38,45,50]
)

empSalHrs = innerjoin(employer,salary,hours, on = :ID)

If for some reason you need to put your dataframes in a Vector, you can use splitting to achieve the same result

empSalHrs = innerjoin([employer,salary,hours]..., on = :ID)

Also, note that I've slightly changed the definitions of the dataframes. Since Array{Int} is an abstract type it shouldn't be used for the variable declaration, because it is bad for performance. It may be not important in this particular scenario, but it's better to make good habits from the start. Instead of Array{Int} one can use

  • Array{Int, 1}([1, 2, 3, 4])
  • Vector{Int}([1, 2, 3, 4])
  • Int[1, 2, 3]
  • [1, 2, 3]

The last one is legit because Julia can infer the type of the container on its own in this simple scenario.

Upvotes: 5

Related Questions