Reputation: 45
I'm sure this is a very simple question, but please have patience - very new to this.
I have a DataFrame with data including sex, earnings and if a record is "old" or "new". I want to add a new column to this DataFrame, with the value coming from one of four sources (columns) based on if they are M/F and old/new, and also matched to the earnings value.
In excel, I would have used index(Match()) to lookup row based on earnings, and columns based on M/F, old/new.
I've created 4 dictionaries for each of the sources and I can return the value I want for a specific earnings value, but I'm struggling to get this to work for all rows and values of earnings. I've tried a function:
function SizeCoeff(x)
if Sex=="M" && Type == "Old"
get(MaleOldSizeEffect,x,0)
elseif Sex=="F" && Type == "Old"
get(FemaleOldSizeEffect,x,0)
elseif Sex=="F" && Type == "New"
get(FemaleNewSizeEffect,x,0)
else get(MaleNewSizeEffect,x,0)
end
end
SizeCoeff.(earnings)
but this isn't working and is returning the default value for all rows (not to mention probably quite clunky)
I have joined the sources of data to my main df using innerjoin, which takes care of the matching based on earnings, but I'm struggling to only pick up the source that I want based on the final two conditions.
I've been researching all day and have tried transform, subsets, grouped dataframes - I feel like there must be a , so any tips would be gratefully received (and apologies for butchering this code!)
Edit: First two lines of my DF - can't get it to display unless i use a screengrab. The figures in yellow are what I'm trying to produce.
Upvotes: 2
Views: 681
Reputation: 14735
It's not clear what role earnings play in this, or how exactly the coefficients are assigned (for eg., why there are two values under Male_Old
, 1 and 5). But here's an example of adding a column to a DataFrame based on the values of two existing columns:
julia> df = DataFrame(Policy_No = [1, 2],
Sex = [:M, :F],
Type = [:Old, :New]) #create the DF
2×3 DataFrame
Row │ Policy_No Sex Type
│ Int64 Symbol Symbol
─────┼───────────────────────────
1 │ 1 M Old
2 │ 2 F New
julia> sizecoeffs = (M = (Old = 1, New = 2),
F = (Old = 7, New = 8));
julia> transform!(df, [:Sex, :Type] =>
ByRow((sex, type) -> sizecoeffs[sex][type]) =>
:Coeff)
2×4 DataFrame
Row │ Policy_No Sex Type Coeff
│ Int64 Symbol Symbol Int64
─────┼──────────────────────────────────
1 │ 1 M Old 1
2 │ 2 F New 8
Let me know if this helps, and what missing in it with respect to your use case.
Upvotes: 1