Edu
Edu

Reputation: 903

Recoding longitudinal variables in R

I have a longitudinal data frame prueba that follows different units (variable LA) over time (variables timeand year). The first 25 observations have the following structure.

> head(prueba, 25)
                     LA month year entry exit total homes
1  Barking and Dagenham    10 2010     2    0     2    NA
2  Barking and Dagenham    11 2010     3    0     3    NA
3  Barking and Dagenham    12 2010     3    0     3    15
4  Barking and Dagenham     1 2011     6    0     6    NA
5  Barking and Dagenham     2 2011     1    0     1    NA
6  Barking and Dagenham     3 2011     2    0     2    NA
7  Barking and Dagenham     4 2011     1    0     1    NA
8  Barking and Dagenham    10 2011     1    0     1    NA
9  Barking and Dagenham    11 2011     1    0     1    NA
10 Barking and Dagenham     1 2012     1    0     1    NA
11 Barking and Dagenham     9 2012     1    0     1    NA
12 Barking and Dagenham     6 2013     2    0     2    NA
13 Barking and Dagenham     1 2014     0    1    -1    NA
14 Barking and Dagenham    12 2014     0    1    -1    NA
15 Barking and Dagenham     3 2015     1    1     0    NA
16 Barking and Dagenham    11 2015     1    1     0    NA
17 Barking and Dagenham    12 2015     1    0     1    NA
18               Barnet    11 2010    24    0    24    NA
19               Barnet    12 2010    28    0    28    86
20               Barnet     1 2011    28    0    28    NA
21               Barnet     2 2011     6    0     6    NA
22               Barnet     3 2011     1    0     1    NA
23               Barnet     4 2011     1    0     1    NA
24               Barnet     7 2011     2    0     2    NA
25               Barnet     8 2011     1    0     1    NA

My aim is to recode the homes variable by assigning the values that are not missing to the observations where month == "2" and year == "2011". In case there are not observations for these values of monthand year, the relabelled observations would be those corresponding to month == "1"and year == "2011". Ideally, the expected output would be something like this:

> head(prueba, 25)
                     LA month year entry exit total homes
1  Barking and Dagenham    10 2010     2    0     2    NA
2  Barking and Dagenham    11 2010     3    0     3    NA
3  Barking and Dagenham    12 2010     3    0     3    NA
4  Barking and Dagenham     1 2011     6    0     6    NA
5  Barking and Dagenham     2 2011     1    0     1    15
6  Barking and Dagenham     3 2011     2    0     2    NA
7  Barking and Dagenham     4 2011     1    0     1    NA
8  Barking and Dagenham    10 2011     1    0     1    NA
9  Barking and Dagenham    11 2011     1    0     1    NA
10 Barking and Dagenham     1 2012     1    0     1    NA
11 Barking and Dagenham     9 2012     1    0     1    NA
12 Barking and Dagenham     6 2013     2    0     2    NA
13 Barking and Dagenham     1 2014     0    1    -1    NA
14 Barking and Dagenham    12 2014     0    1    -1    NA
15 Barking and Dagenham     3 2015     1    1     0    NA
16 Barking and Dagenham    11 2015     1    1     0    NA
17 Barking and Dagenham    12 2015     1    0     1    NA
18               Barnet    11 2010    24    0    24    NA
19               Barnet    12 2010    28    0    28    NA
20               Barnet     1 2011    28    0    28    NA
21               Barnet     2 2011     6    0     6    86
22               Barnet     3 2011     1    0     1    NA
23               Barnet     4 2011     1    0     1    NA
24               Barnet     7 2011     2    0     2    NA
25               Barnet     8 2011     1    0     1    NA

I have approached this by using data.table on the following basis:

test = data.table(prueba)
setkey(test, LA)
test$homes =test[, .SD[, ifelse(year == "2011" & month == "2", !is.na(homes), homes)], by=LA]

But it is not yielding the expected output.

> head(test, 25)
                      LA month year entry exit total homes
 1: Barking and Dagenham    10 2010     2    0     2    NA
 2: Barking and Dagenham    11 2010     3    0     3    NA
 3: Barking and Dagenham    12 2010     3    0     3    15
 4: Barking and Dagenham     1 2011     6    0     6    NA
 5: Barking and Dagenham     2 2011     1    0     1    NA
 6: Barking and Dagenham     3 2011     2    0     2    NA
 7: Barking and Dagenham     4 2011     1    0     1    NA
 8: Barking and Dagenham    10 2011     1    0     1    NA
 9: Barking and Dagenham    11 2011     1    0     1    NA
10: Barking and Dagenham     1 2012     1    0     1    NA
11: Barking and Dagenham     9 2012     1    0     1    NA
12: Barking and Dagenham     6 2013     2    0     2    NA
13: Barking and Dagenham     1 2014     0    1    -1    NA
14: Barking and Dagenham    12 2014     0    1    -1    NA
15: Barking and Dagenham     3 2015     1    1     0    NA
16: Barking and Dagenham    11 2015     1    1     0    NA
17: Barking and Dagenham    12 2015     1    0     1    NA
18:               Barnet    11 2010    24    0    24    NA
19:               Barnet    12 2010    28    0    28    86
20:               Barnet     1 2011    28    0    28    NA
21:               Barnet     2 2011     6    0     6    NA
22:               Barnet     3 2011     1    0     1    NA
23:               Barnet     4 2011     1    0     1    NA
24:               Barnet     7 2011     2    0     2    NA
25:               Barnet     8 2011     1    0     1    NA
                      LA month year entry exit total homes

I would be grateful if someone can suggest an alternative approach to this - not necessarily wit data.table.

Upvotes: 0

Views: 194

Answers (1)

akash87
akash87

Reputation: 3994

library(dplyr)
dfs <- data.frame(df %>% 
                  group_by(LA) %>% 
                  summarise(Homes = sum(homes, na.rm = T)) %>%
                  inner_join(.,df, by = 'LA') %>% 
                  mutate(Homes = ifelse(month == 2 & year == 2011, Homes, NA)))

This should do the trick, and using the dplyr package has a lot of speed as opposed to doing it iteratively (e.g. for or while).

Upvotes: 1

Related Questions