Reputation: 13
So I'm trying to code my fraud detection algorithm using R. I have a numerical value (fraudval) proportional to how likely it is that a certain user is committing fraud in a vector. How do I create a new column that would state if it's HIGH, MEDIUM, or LOW, given some sensitivity of 'fraudval' (i.e. if 0.6 > 'fraudval' > 0.3, then it's LOW, if in between 0.6 and 0.8 MED, and and HIGH if it's 0.8 or higher.
Here is my input and expected output
sensitivities are: very low - 0, low - 0.3, medium - 0.6, high - 0.8
input (df):
ID fraudval
1 0.4
2 0.8
3 0.2
4 0.6
output (df):
ID fraudval test
1 0.4 LOW
2 0.8 HIGH
3 0.2 VERY LOW
4 0.6 MEDIUM
Thanks in advance! :D
Upvotes: 1
Views: 96
Reputation: 176638
I would use cut
:
R> df$test <- cut(df$fraudval, c(0,.3,.6,.8,Inf),
+ c("VERY LOW", "LOW", "MED", "HIGH"), right=FALSE)
R> d
ID fraudval test
1 1 0.4 LOW
2 2 0.8 HIGH
3 3 0.2 VERY LOW
4 4 0.6 MED
Upvotes: 1