Reputation: 521
I have a dataframe which looks something like this:
set.seed(100)
library(dplyr)
df <- tibble(ID = rep(1:4, each = 2),
weight = rep(abs(rnorm(4, 5, 3)), each = 2),
year = rep(2013:2014, 4),
var1 = sample(1:5, 8, rep = TRUE),
var2 = sample(1:5, 8, rep = TRUE))
Producing data which looks like this:
# A tibble: 8 x 5
ID weight year var1 var2
<int> <dbl> <int> <int> <int>
1 1 3.493423 2013 3 2
2 1 3.493423 2014 1 2
3 2 5.394593 2013 4 2
4 2 5.394593 2014 5 4
5 3 4.763249 2013 2 3
6 3 4.763249 2014 2 4
7 4 7.660354 2013 4 3
8 4 7.660354 2014 4 4
I wish to make quick, simple inference on how things are changing from one year to the next. The ID variable is a unique identifier for each person in my longitudinal sample.
My idea would be to use group_by(ID)
to group my data by by their ID, and then perhaps make use of the summarise
function in some way. I desire the "collapse" effect we see when we use the summarise
function.
For example, say I want to see if var1
remains the same across the two years, by person. We see above this is true of persons 3 and 4. I would like to be able to obtain the following dataframe:
# A tibble: 4 x 3
ID weight indicator
<int> <dbl> <lgl>
1 1 3.493423 FALSE
2 2 5.394593 FALSE
3 3 4.763249 TRUE
4 4 7.660354 TRUE
or, say I wanted to see the difference in var2
from 2013 to 2014, I would desire the following dataframe:
# A tibble: 4 x 3
ID weight diff_var2
<int> <dbl> <dbl>
1 1 3.493423 0
2 2 5.394593 2
3 3 4.763249 1
4 4 7.660354 1
Does anyone have any ideas on how to go about this? I don't know how this would generalise to more years of data, but for the time being I am simply working with two years of longitudinal data.
Ultimately, for example, I would like to know the weighted proportion of people whose var1
does not change, or the weighted mean movement in var2
etc. These are just some examples of the sorts of queries I am looking into.
Upvotes: 0
Views: 930
Reputation: 887193
We can use data.table
library(data.table)
setDT(df)[, .(indicator=uniqueN(var1)==1, diff_var2= diff(var2)), ID]
# ID indicator diff_var2
#1: 1 FALSE 0
#2: 2 FALSE 2
#3: 3 TRUE 1
#4: 4 TRUE 1
Upvotes: 0
Reputation: 43344
You've pretty much already laid out what you need to do, but group by both ID and weight if you want to save the columns.
df %>% group_by(ID, weight) %>%
summarise(indicator = n_distinct(var1) < n(),
diff_var2 = diff(var2))
## Source: local data frame [4 x 4]
## Groups: ID [?]
##
## ID weight indicator diff_var2
## <int> <dbl> <lgl> <int>
## 1 1 3.493423 FALSE 0
## 2 2 5.394593 FALSE 2
## 3 3 4.763249 TRUE 1
## 4 4 7.660354 TRUE 1
If you have more than two years or missing data, you may need a more robust approach.
Upvotes: 3