Reputation: 131
I want to create two dummy variables: a) one that captures all negative changes in the x1. If there is a negative change ==1, otherwise ==0.
And b) that captures all -1 (and higher) changes. For example: 10.5 to 9.5 or from 10 to 9(or from 10 to 6). This one also as dummy: if -1 or more change then ==1, otherwise ==0.
Sine the data looks something like this, the variable should capture negative values for each personID.
personid year x1
33 1990 0
33 1991 3.5
33 1992 2.75
33 1993 3.25
33 1994 6
34 1990 17
34 1991 9
34 1992 16.5
34 1993 16.75
For replication, use the code below.
set.seed(100)
mydata <- data.frame(
x1 = sample(c(0:30, 1.5,5.75,9.25,10.25,11.75), 100, replace = TRUE),
personID = rep(c(1:10), each = 10)
)
I tried to generate these variables using ave
...it doesn't help much. I know that I am not using it correctly but not sure where..
mydata$a <- with(mydata, ave(x1, personID, FUN = function(x) c(TRUE, diff(x) !=-1) & x!=-1))
EDIT:
dput(data)
structure(list(personid = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 20L, 20L, 20L, 20L, 20L, 20L,
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 40L, 40L,
40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L,
40L, 40L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L,
41L, 41L, 41L, 41L, 41L, 41L, 42L, 42L, 42L, 42L, 42L, 42L, 42L,
42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 51L, 51L, 51L,
51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L,
51L), x1 = c(37, 34, 30.75, 29, 37, 32.25, 25.75, 32.5, 27, 31,
28.5, 23.75, 25.75, 28.5, 28.5, 27.75, 25.75, 25.75, 27.25, 31,
32.5, 35.5, 27.25, 32.25, 30.5, 28.75, 29.5, 29, 29, 27, 28.75,
28.75, 25.75, 25.75, 22, 22, 29, 30, 20, 22, 12, 11.5, 10, 14.5,
24, 15.5, 23.5, 14, 24, 10, 9, 34, 16, 9.5, 19, 31, 20, 9.5,
9.5, 21, 29, 20, 26, 26, 24.5, 5, 16.5, 18.5, 22.5, 31.5, 23.5,
20, 15.25, 20.75, 32, 23.5, 25, 20, 27, 22.5, 24.5, 28.5, 18,
17.5, 18.5, 34, 30.5, 32.5, 31, 27, 31, 31, 35.5, 31, 31, 29,
31.5, 29.25, 31, 31, 28, 29)), .Names = c("personid", "x1"), class = "data.frame", row.names = c(NA,
-102L))
Upvotes: 1
Views: 688
Reputation: 7435
You can also use dplyr
:
library(dplyr)
result <- mydata %>% group_by(personID) %>%
mutate(a = ifelse((x1-lag(x1)) < 0, 1, 0)) %>%
mutate(b = ifelse((x1-lag(x1)) <= -1, 1, 0))
Here, we detect change group_by
each personID
. The function mutate
creates your dummy variable columns a
and b
. Instead of using diff
, test by subtracting the lag(x1)
from x1
. The results using your simulated data with seed=100
except I replaced x1
with 10.5
in row 2
to illustrate a case where a
is 1
but b
is 0
:
print(result)
##Source: local data frame [100 x 4]
##Groups: personID [10]
## x1 personID a b
## <dbl> <int> <dbl> <dbl>
##1 11 1 NA NA
##2 10.5 1 1 0
##3 19 1 0 0
##4 2 1 1 1
##5 16 1 0 0
##6 17 1 0 0
##7 29 1 0 0
##8 13 1 1 1
##9 19 1 0 0
##10 6 1 1 1
Alternatively, we can use diff
to test the conditions, but we then need to prepend the result with NA
so that what is returned by the function used by mutate
has the same length as what is input:
result <- data %>% group_by(personid) %>%
mutate(a = c(NA, ifelse(diff(x1) < 0, 1, 0))) %>%
mutate(b = c(NA, ifelse(diff(x1) <= -1, 1, 0)))
Upvotes: 0
Reputation: 226172
What you're looking for is a combination of (1) some split-apply-combine approach (tapply
in base R, ddply
in plyr
, group_by
+ mutate
in plyr
... and (2) diff
.
Data:
set.seed(100)
mydata <- data.frame(
x1 = sample(c(0:30, 1.5,5.75,9.25,10.25,11.75), 100, replace = TRUE),
personID = rep(c(1:10), each = 10)
)
You'll have to decide what you want to do about the first/last value in each individual's sequence: is the (first, last) value equal to (NA, 0) ? Here I'm setting the first value to zero.
diff_to_dummy <- function(x) {
c(0,as.numeric(diff(x) <(-1)))
}
Now tapply
applies the function to x1
for each personID
; unlist
puts the values back together.
dval <- with(mydata,unlist(tapply(x1,list(personID),diff_to_dummy)))
Upvotes: 2