Reputation: 3838
I am wondering if there is a way in Julia DataFrames to join multiple data frames in one go,
using DataFrames
employer = DataFrame(
ID = Array{Int64}([01,02,03,04,05,09,11,20]),
name = Array{String}(["Matthews","Daniella", "Kofi", "Vladmir", "Jean", "James", "Ayo", "Bill"])
)
salary = DataFrame(
ID = Array{Int64}([01,02,03,04,05,06,08,23]),
amount = Array{Int64}([2050,3000,3500,3500,2500,3400,2700,4500])
)
hours = DataFrame(
ID = Array{Int64}([01,02,03,04,05,08,09,23]),
time = Array{Int64}([40,40,40,40,40,38,45,50])
)
# I tried adding them in an array but ofcoures that results in an error
empSalHrs = innerjoin([employer,salary,hours], on = :ID)
# In python you can achieve this using
import pandas as pd
from functools import reduce
df = reduce(lambda l,r : pd.merge(l,r, on = "ID"), [employer, salary, hours])
Is there a similar way to do this in julia?
Upvotes: 4
Views: 100
Reputation: 2342
You were almost there. As it is written in DataFrames.jl manual you just need to pass more than one dataframe as an argument.
using DataFrames
employer = DataFrame(
ID = [01,02,03,04,05,09,11,20],
name = ["Matthews","Daniella", "Kofi", "Vladmir", "Jean", "James", "Ayo", "Bill"])
salary = DataFrame(
ID = [01,02,03,04,05,06,08,23],
amount = [2050,3000,3500,3500,2500,3400,2700,4500])
hours = DataFrame(
ID = [01,02,03,04,05,08,09,23],
time = [40,40,40,40,40,38,45,50]
)
empSalHrs = innerjoin(employer,salary,hours, on = :ID)
If for some reason you need to put your dataframes in a Vector
, you can use splitting to achieve the same result
empSalHrs = innerjoin([employer,salary,hours]..., on = :ID)
Also, note that I've slightly changed the definitions of the dataframes. Since Array{Int}
is an abstract type it shouldn't be used for the variable declaration, because it is bad for performance. It may be not important in this particular scenario, but it's better to make good habits from the start. Instead of Array{Int}
one can use
Array{Int, 1}([1, 2, 3, 4])
Vector{Int}([1, 2, 3, 4])
Int[1, 2, 3]
[1, 2, 3]
The last one is legit because Julia can infer the type of the container on its own in this simple scenario.
Upvotes: 5