Reputation: 2977
Trying to subset a dataframe and also do some basic calculations in one pass to avoid having to duplicate the function over and over. The subset part is selecting specific columns. And the basic calculations are simple comparisons between various columns.
Here's some data:
structure(list(name = structure(1:12, .Label = c("a",
"b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l"), class = "factor"), data_2018 = c(4, 4, 4,
4, 3, 4, 2, 4, 3, 4, 4, 3),
data_2017 = c(1, 4, 4, 3, 2, 3, 3,
4, 2, 1, 1, 2),
pilot = c(2.68421052631579, 2.73684210526316,
3.52631578947368, 3.42105263157895, 3.05263157894737, 2.78947368421053,
2.21052631578947, 3.68421052631579, 2.36842105263158, 3.73684210526316,
2.47368421052632, 2.05263157894737),
all = c(2.77777777777778,
2.85185185185185, 3.62962962962963, 3.51851851851852, 3.18518518518519,
2.92592592592593, 2.2962962962963, 3.74074074074074, 2.40740740740741,
3.77777777777778, 2.55555555555556, 2.07407407407407),
general = c(2.79166666666667,
2.79166666666667, 3.58333333333333, 3.45833333333333, 3.08333333333333,
2.83333333333333, 2.41666666666667, 3.70833333333333, 2.54166666666667,
3.79166666666667, 2.54166666666667, 2.16666666666667),
tool = c("DoS",
"DoS", "DoS", "DoS", "DoS", "DoS", "DoS", "DoS", "DoS", "DoS",
"DoS", "DoS"), status = c(6, 8, 8, 6, 6, 6, 2, 8, 6, 6, 6, 6)), row.names = c(NA,
12L), class = "data.frame")
And here's what I've tried:
diffs <- select(agged
, agged$data_2018
, ifelse(agged$data_2018 >= agged$data_2017, 1, -1)
, ifelse(agged$data_2018 >= agged$pilot, 1, -1)
, ifelse(agged$data_2018 >= agged$all, 1, -1)
, ifelse(agged$data_2018 >= agged$general, 1, -1))
But that's returning the
Error Each argument must yield either positive or negative integers.
Hoping that expected output would be something like :
data_2018 | vs_data_2017 | vs_pilot | vs_all | vs_general
4 | 1 | 1 | -1 | 1
4 | 1 | 1 | -1 | 1
4 | -1 | 1 | 1 | 1
4 | -1 | -1 | 1 | 1
3 | 1 | -1 | 1 | 1
4 | -1 | 1 | 1 | -1
I've tried running just the ifelse
part on its own and that returns a vector of correct integers, so I'm not sure what else to try. Is there a way to do this without even dplyr
? Would love to be able to do this without that package.
Upvotes: 1
Views: 74
Reputation: 51592
Based on your description, you can directly use the comparison symbol (>=
), i.e.
(dd$data_2018 >= dd[3:6]) * 1
# data_2017 pilot all general
#1 1 1 1 1
#2 1 1 1 1
#3 1 1 1 1
#4 1 1 1 1
#5 1 0 0 0
#6 1 1 1 1
#7 0 0 0 0
#8 1 1 1 1
#9 1 1 1 1
#10 1 1 1 1
#11 1 1 1 1
#12 1 1 1 1
NOTE: I converted to 0 and 1 instead. You can easily change it to 1 and -1
Upvotes: 1
Reputation: 2716
require(dplyr)
agged %>%
mutate(vs_data_2017 = if_else(data_2018 >= data_2017, 1, -1),
vs_pilot = if_else(data_2018 >= pilot , 1, -1),
vs_all = if_else(data_2018 >= all , 1, -1),
vs_general = if_else(data_2018 >= general , 1, -1)) %>%
select(data_2018, vs_data_2017, vs_pilot, vs_all , vs_general)
Upvotes: 1
Reputation: 2399
Check this solution:
library(dplyr)
agged %>%
select(data_2018, data_2017, pilot, all, general) %>%
mutate_at(2:5, funs(if_else(data_2018 >= ., 1, -1))) %>%
rename_at(2:5, funs(str_c('vs_', .)))
Upvotes: 1
Reputation: 5491
From what I have understood, this might be what you want :
cbind(data_2018 = agged$data_2018, sapply(c("data_2017", "pilot", "all", "general"), function(c) (agged$data_2018 >= agged[[c]]) * 2 - 1))
You apply condition on the columns you want to be -1 or 1 depending on condition. The boolean TRUE or FALSE is converted from 0/1 to -1/1 using *2-1.
Upvotes: 1