Reputation: 2977

Subsetting a data frame and running calculations in one pass

Trying to subset a dataframe and also do some basic calculations in one pass to avoid having to duplicate the function over and over. The subset part is selecting specific columns. And the basic calculations are simple comparisons between various columns.

Here's some data:

structure(list(name = structure(1:12, .Label = c("a", 
"b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l"), class = "factor"), data_2018 = c(4, 4, 4, 
4, 3, 4, 2, 4, 3, 4, 4, 3), 
data_2017 = c(1, 4, 4, 3, 2, 3, 3, 
4, 2, 1, 1, 2), 
pilot = c(2.68421052631579, 2.73684210526316, 
3.52631578947368, 3.42105263157895, 3.05263157894737, 2.78947368421053, 
2.21052631578947, 3.68421052631579, 2.36842105263158, 3.73684210526316, 
2.47368421052632, 2.05263157894737), 
all = c(2.77777777777778, 
2.85185185185185, 3.62962962962963, 3.51851851851852, 3.18518518518519, 
2.92592592592593, 2.2962962962963, 3.74074074074074, 2.40740740740741, 
3.77777777777778, 2.55555555555556, 2.07407407407407), 
general = c(2.79166666666667, 
2.79166666666667, 3.58333333333333, 3.45833333333333, 3.08333333333333, 
2.83333333333333, 2.41666666666667, 3.70833333333333, 2.54166666666667, 
3.79166666666667, 2.54166666666667, 2.16666666666667), 
tool = c("DoS", 
"DoS", "DoS", "DoS", "DoS", "DoS", "DoS", "DoS", "DoS", "DoS", 
"DoS", "DoS"), status = c(6, 8, 8, 6, 6, 6, 2, 8, 6, 6, 6, 6)), row.names = c(NA, 
12L), class = "data.frame")

And here's what I've tried:

diffs <- select(agged
                , agged$data_2018
                , ifelse(agged$data_2018 >= agged$data_2017, 1, -1)
                , ifelse(agged$data_2018 >= agged$pilot, 1, -1)
                , ifelse(agged$data_2018 >= agged$all, 1, -1)
                , ifelse(agged$data_2018 >= agged$general, 1, -1))

But that's returning the

Error Each argument must yield either positive or negative integers.

Hoping that expected output would be something like :

data_2018 | vs_data_2017 | vs_pilot | vs_all | vs_general
4         |    1         |    1     |   -1   |    1
4         |    1         |    1     |   -1   |    1
4         |    -1        |    1     |   1   |    1
4         |    -1        |    -1    |   1   |    1
3         |    1         |    -1    |   1   |    1
4         |    -1        |    1     |   1   |    -1

I've tried running just the ifelse part on its own and that returns a vector of correct integers, so I'm not sure what else to try. Is there a way to do this without even dplyr? Would love to be able to do this without that package.

Upvotes: 1

Answers (4)

Sotos

Reputation: 51592

Based on your description, you can directly use the comparison symbol (>=), i.e.

(dd$data_2018 >= dd[3:6]) * 1

#   data_2017 pilot all general
#1          1     1   1       1
#2          1     1   1       1
#3          1     1   1       1
#4          1     1   1       1
#5          1     0   0       0
#6          1     1   1       1
#7          0     0   0       0
#8          1     1   1       1
#9          1     1   1       1
#10         1     1   1       1
#11         1     1   1       1
#12         1     1   1       1

NOTE: I converted to 0 and 1 instead. You can easily change it to 1 and -1

Upvotes: 1

Mouad_Seridi

Reputation: 2716

require(dplyr)
agged %>% 
  mutate(vs_data_2017 =  if_else(data_2018 >= data_2017, 1, -1),
         vs_pilot     =  if_else(data_2018 >= pilot    , 1, -1),
         vs_all       =  if_else(data_2018 >= all      , 1, -1),
         vs_general   =  if_else(data_2018 >= general  , 1, -1)) %>%
  select(data_2018, vs_data_2017, vs_pilot, vs_all , vs_general)

Upvotes: 1

Paweł Chabros

Reputation: 2399

Check this solution:

library(dplyr)

agged %>%
  select(data_2018, data_2017, pilot, all, general) %>%
  mutate_at(2:5, funs(if_else(data_2018 >= ., 1, -1))) %>%
  rename_at(2:5, funs(str_c('vs_', .)))

Upvotes: 1

Clemsang

Reputation: 5491

From what I have understood, this might be what you want :

cbind(data_2018 = agged$data_2018, sapply(c("data_2017", "pilot", "all", "general"), function(c) (agged$data_2018 >= agged[[c]]) * 2 - 1))

You apply condition on the columns you want to be -1 or 1 depending on condition. The boolean TRUE or FALSE is converted from 0/1 to -1/1 using *2-1.

Upvotes: 1

Subsetting a data frame and running calculations in one pass

Answers (4)

Related Questions