hmcgowan01
hmcgowan01

Reputation: 31

Julia - Selecting subset of a dataframe conditioned on a column in another dataframe

In Julia I have two dataframes and I want to return a dataframe which selects the rows in the first dataframes that have in the column Fund a fund that appears in the second dataframe. A simple example would be:

df1 = DataFrame(Fund = ["AAA", "AAA", "BBB", "CCC", "DDD"], Purchase = [1000, 500, 600, 800,900])

df2 = DataFrame(Fund = ["AAA", "CCC"], Totals =[1000,200])

and what I would like to return is:

df3 = DataFrame(Fund = ["AAA", "AAA","CCC"], Purchase = [1000, 500, 800])

I have about 10 columns in df1 and a few thousand rows The "Fund" column in df2 will always contain unique funds and will always be subset of df1.Fund and again may contain more than a 1,000 rows

I am new to Julia and have created the function below and was wondering if there was a better way of solving this.

function newtransactions(df1,df2)
res = DataFrame([Any[],Any[]],["Fund", "Purchase"])
for t ∈ df2.Fund
    res = append!(res,subset(df1, :Fund => X-> (X .== t)))
end
return res

end

Upvotes: 2

Views: 374

Answers (1)

Przemyslaw Szufel
Przemyslaw Szufel

Reputation: 42244

You need to perform an innerjoin:

julia> innerjoin(df1, df2, on=:Fund)
3×3 DataFrame
 Row │ Fund    Purchase  Totals
     │ String  Int64     Int64
─────┼──────────────────────────
   1 │ AAA         1000    1000
   2 │ AAA          500    1000
   3 │ CCC          800     200

Note that there is also leftjoin and rightjoin if you need to select rather all rows from the first or second table.

Upvotes: 3

Related Questions