Reputation: 2067
I am trying to compute the change from the previous day for each group.
Say I have these observations:
cat var1 var2 date
1 male 172 68 2011-01-01
2 female 61 141 2011-01-02
3 female 211 208 2011-01-03
4 other 10 95 2011-01-04
5 female 149 49 2011-01-05
I want to add a new column with a 0 or 1 if the current var1
is greater or not than its previous result using an ifelse
statement.
cat var1 var2 date NEWCOL
1 male 172 68 2011-01-01 -
2 female 61 141 2011-01-02 -
3 female 211 208 2011-01-03 1 # since its greater than 61
4 female 10 95 2011-01-04 0 # since its less than 211
5 female 149 49 2011-01-05 1 # since its greater than 10
Data:
tt <- seq(from = as.Date('2011-01-01', format = "%Y-%m-%d"), to = as.Date('2011-07-31', format = "%Y-%m-%d"), by = 1)
df <- data.frame(
cat = sample(c('male', 'female', 'other'), length(tt), replace=TRUE),
var1 = sample(length(tt), replace = TRUE),
var2 = sample(length(tt), replace = TRUE),
date = as.Date(tt)
)
EDIT: This does not work:
df %>%
mutate(var1 = as.numeric(var1)) %>%
group_by(cat, date) %>%
mutate(chng = first(var1) - last(var))
Upvotes: 1
Views: 285
Reputation: 2987
If I understand you correctly, you can sort your data and then use lag
df %>%
arrange(cat, date) %>%
group_by(cat) %>%
mutate(chng = ifelse(var1 > lag(var1), 1, 0))
# cat var1 var2 date chng
# <fct> <int> <int> <date> <dbl>
# 1 female 179 21 2011-01-05 NA
# 2 female 207 57 2011-01-08 1
# 3 female 132 21 2011-01-11 0
# 4 female 142 134 2011-01-14 1
# 5 female 7 175 2011-01-15 0
# 6 female 52 44 2011-01-18 1
# 7 female 19 18 2011-01-19 0
# 8 female 129 22 2011-01-20 1
# 9 female 23 37 2011-01-22 0
#10 female 141 35 2011-01-23 1
Upvotes: 2
Reputation: 1718
try this:
the var1 vector:
df$var1
[1] 212 103 43 193 29 153 164 115 136 91 9 130 72 116 102 113 28 167 210 132 14 72 82 53 13 91 201
[28] 149 153 73 23 28 4 166 163 103 5 4 109 101 44 58 49 50 11 120 4 66 27 132 89 205 110 1
[55] 139 73 9 34 6 29 73 47 51 105 45 101 16 3 19 212 60 144 208 53 56 35 65 31 158 83 195
[82] 10 60 39 12 154 141 185 15 140 48 9 51 36 120 149 172 142 71 26 193 61 5 175 162 141 35 127
[109] 150 103 194 165 157 196 175 66 186 138 99 166 164 136 118 74 46 66 40 57 155 191 139 195 19 175 57
[136] 137 188 50 211 44 149 22 50 15 162 125 49 155 184 168 16 137 208 135 116 110 136 117 196 55 62 55
[163] 149 70 85 23 139 102 107 195 139 52 160 175 159 5 119 55 137 166 131 115 53 119 19 82 87 17 169
[190] 86 156 197 210 30 43 133 54 212 45 29 149 108 30 142 78 42 2 83 102 64 53 172
its diff (differences vector. n-1 terms)
diff(df$var1)
[1] -109 -60 150 -164 124 11 -49 21 -45 -82 121 -58 44 -14 11 -85 139 43 -78 -118 58 10
[23] -29 -40 78 110 -52 4 -80 -50 5 -24 162 -3 -60 -98 -1 105 -8 -57 14 -9 1 -39
[45] 109 -116 62 -39 105 -43 116 -95 -109 138 -66 -64 25 -28 23 44 -26 4 54 -60 56 -85
[67] -13 16 193 -152 84 64 -155 3 -21 30 -34 127 -75 112 -185 50 -21 -27 142 -13 44 -170
[89] 125 -92 -39 42 -15 84 29 23 -30 -71 -45 167 -132 -56 170 -13 -21 -106 92 23 -47 91
[111] -29 -8 39 -21 -109 120 -48 -39 67 -2 -28 -18 -44 -28 20 -26 17 98 36 -52 56 -176
[133] 156 -118 80 51 -138 161 -167 105 -127 28 -35 147 -37 -76 106 29 -16 -152 121 71 -73 -19
[155] -6 26 -19 79 -141 7 -7 94 -79 15 -62 116 -37 5 88 -56 -87 108 15 -16 -154 114
[177] -64 82 29 -35 -16 -62 66 -100 63 5 -70 152 -83 70 41 13 -180 13 90 -79 158 -167
[199] -16 120 -41 -78 112 -64 -36 -40 81 19 -38 -11 119
now you can ask:
diff(df$var1)>0
[1] FALSE FALSE TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE
[19] FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE
[37] FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE
[55] FALSE FALSE TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE FALSE TRUE TRUE
[73] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
[91] FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE
[109] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
[127] TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
[145] FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
[163] FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE
[181] FALSE FALSE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE
[199] FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE TRUE
since F=0 and T=1 just use the numeric version and add a leading NA:
df$change <- c(NA,as.numeric(diff(df$var1)>0))
actually you don't need ifelse.... but you can also do:
df$change2 <- c(NA,ifelse(test = diff(df$var1)>0,yes = 1,no = 0))
Upvotes: 2