user8959427
user8959427

Reputation: 2067

ifelse statement to compute if value is greater than its previous value

I am trying to compute the change from the previous day for each group.

Say I have these observations:

       cat var1 var2       date
1     male  172   68 2011-01-01
2   female   61  141 2011-01-02
3   female  211  208 2011-01-03
4    other   10   95 2011-01-04
5   female  149   49 2011-01-05

I want to add a new column with a 0 or 1 if the current var1 is greater or not than its previous result using an ifelse statement.

       cat var1 var2       date   NEWCOL
1     male  172   68 2011-01-01     -
2   female   61  141 2011-01-02     -
3   female  211  208 2011-01-03     1   # since its greater than 61
4   female   10   95 2011-01-04     0   # since its less than 211
5   female  149   49 2011-01-05     1   # since its greater than 10

Data:

tt <- seq(from = as.Date('2011-01-01', format = "%Y-%m-%d"), to = as.Date('2011-07-31', format = "%Y-%m-%d"), by = 1)

df <- data.frame(
  cat = sample(c('male', 'female', 'other'), length(tt), replace=TRUE),
  var1 = sample(length(tt), replace = TRUE),
  var2 = sample(length(tt), replace = TRUE),
  date = as.Date(tt)
)

EDIT: This does not work:

df %>% 
  mutate(var1 = as.numeric(var1)) %>% 
  group_by(cat, date) %>% 
  mutate(chng = first(var1) - last(var))

Upvotes: 1

Views: 285

Answers (2)

Matt
Matt

Reputation: 2987

If I understand you correctly, you can sort your data and then use lag

df %>% 
  arrange(cat, date) %>%
  group_by(cat) %>%
  mutate(chng = ifelse(var1 > lag(var1), 1, 0))

#   cat     var1  var2 date        chng
#   <fct>  <int> <int> <date>     <dbl>
# 1 female   179    21 2011-01-05    NA
# 2 female   207    57 2011-01-08     1
# 3 female   132    21 2011-01-11     0
# 4 female   142   134 2011-01-14     1
# 5 female     7   175 2011-01-15     0
# 6 female    52    44 2011-01-18     1
# 7 female    19    18 2011-01-19     0
# 8 female   129    22 2011-01-20     1
# 9 female    23    37 2011-01-22     0
#10 female   141    35 2011-01-23     1

Upvotes: 2

Zahiro Mor
Zahiro Mor

Reputation: 1718

try this:

the var1 vector:

df$var1
  [1] 212 103  43 193  29 153 164 115 136  91   9 130  72 116 102 113  28 167 210 132  14  72  82  53  13  91 201
 [28] 149 153  73  23  28   4 166 163 103   5   4 109 101  44  58  49  50  11 120   4  66  27 132  89 205 110   1
 [55] 139  73   9  34   6  29  73  47  51 105  45 101  16   3  19 212  60 144 208  53  56  35  65  31 158  83 195
 [82]  10  60  39  12 154 141 185  15 140  48   9  51  36 120 149 172 142  71  26 193  61   5 175 162 141  35 127
[109] 150 103 194 165 157 196 175  66 186 138  99 166 164 136 118  74  46  66  40  57 155 191 139 195  19 175  57
[136] 137 188  50 211  44 149  22  50  15 162 125  49 155 184 168  16 137 208 135 116 110 136 117 196  55  62  55
[163] 149  70  85  23 139 102 107 195 139  52 160 175 159   5 119  55 137 166 131 115  53 119  19  82  87  17 169
[190]  86 156 197 210  30  43 133  54 212  45  29 149 108  30 142  78  42   2  83 102  64  53 172 

its diff (differences vector. n-1 terms)

diff(df$var1)
  [1] -109  -60  150 -164  124   11  -49   21  -45  -82  121  -58   44  -14   11  -85  139   43  -78 -118   58   10
 [23]  -29  -40   78  110  -52    4  -80  -50    5  -24  162   -3  -60  -98   -1  105   -8  -57   14   -9    1  -39
 [45]  109 -116   62  -39  105  -43  116  -95 -109  138  -66  -64   25  -28   23   44  -26    4   54  -60   56  -85
 [67]  -13   16  193 -152   84   64 -155    3  -21   30  -34  127  -75  112 -185   50  -21  -27  142  -13   44 -170
 [89]  125  -92  -39   42  -15   84   29   23  -30  -71  -45  167 -132  -56  170  -13  -21 -106   92   23  -47   91
[111]  -29   -8   39  -21 -109  120  -48  -39   67   -2  -28  -18  -44  -28   20  -26   17   98   36  -52   56 -176
[133]  156 -118   80   51 -138  161 -167  105 -127   28  -35  147  -37  -76  106   29  -16 -152  121   71  -73  -19
[155]   -6   26  -19   79 -141    7   -7   94  -79   15  -62  116  -37    5   88  -56  -87  108   15  -16 -154  114
[177]  -64   82   29  -35  -16  -62   66 -100   63    5  -70  152  -83   70   41   13 -180   13   90  -79  158 -167
[199]  -16  120  -41  -78  112  -64  -36  -40   81   19  -38  -11  119  

now you can ask:

diff(df$var1)>0
[1] FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE
 [19] FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE
 [37] FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE
 [55] FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE  TRUE
 [73] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
 [91] FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE
[109] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
[127]  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
[145] FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
[163] FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE
[181] FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE
[199] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE

since F=0 and T=1 just use the numeric version and add a leading NA:

df$change <- c(NA,as.numeric(diff(df$var1)>0))

actually you don't need ifelse.... but you can also do:

df$change2 <- c(NA,ifelse(test = diff(df$var1)>0,yes = 1,no = 0))

Upvotes: 2

Related Questions