Reputation: 903
I have a data frame with a similar longitudinal structure as data
:
data = data.frame (
ID = c("a","a","a","b","b","b","c","c", "c"),
period = c(1,2,3,1,2,3,1,2,3),
size = c(3,3,NA, NA, NA,1, 14,14, 14))
The values of the variable size
are fixed so that each period has the same value for size
. Yet some observations have missing values. My aim consists of replacing these missing values
with the value of size
associated with the periods where there is no missing (e.g. 3 for ID
"a" and 1 for ID
"b").
The desired data frame should look something similar to:
data.1
ID period value
a 1 3
a 2 3
a 3 3
b 1 1
b 2 1
b 3 1
c 1 14
c 2 14
c 3 14
I have tried different combinations of the formula below but I don't get the result I am looking for.
library(dplyr)
data.1 = data %>% group_by(ID) %>%
mutate(new.size = ifelse(is.na(size), !is.na(size),
ifelse(!is.na(size), size, 0)))
That yields the following:
data.1
Source: local data frame [9 x 4]
Groups: ID [3]
ID period size new.size
(fctr) (dbl) (dbl) (dbl)
1 a 1 3 3
2 a 2 3 3
3 a 3 NA 0
4 b 1 NA 0
5 b 2 NA 0
6 b 3 1 1
7 c 1 14 14
8 c 2 14 14
9 c 3 14 14
I would be grateful if someone could give me a hint on how to get the right solution.
Upvotes: 1
Views: 328
Reputation: 389095
To correct your code, you can try the following with dplyr
library(dplyr)
data %>% group_by(ID) %>%
mutate(new.size = ifelse(is.na(size), size[!is.na(size)],size))
# ID period size new.size
# (fctr) (dbl) (dbl) (dbl)
#1 a 1 3 3
#2 a 2 3 3
#3 a 3 NA 3
#4 b 1 NA 1
#5 b 2 NA 1
#6 b 3 1 1
#7 c 1 14 14
#8 c 2 14 14
#9 c 3 14 14
Or a base R
alternative with ave
data$new.size <- ave(data$size,data$ID, FUN=function(x)unique(x[!is.na(x)]))
data$new.size
#[1] 3 3 3 1 1 1 14 14 14
Upvotes: 3
Reputation: 5314
here another solution using dplyr
with na.omit
group_by(data, ID) %>%
mutate(value=na.omit(size)[1])
Source: local data frame [9 x 4]
Groups: ID [3]
ID period size value
<fctr> <dbl> <dbl> <dbl>
1 a 1 3 3
2 a 2 3 3
3 a 3 NA 3
4 b 1 NA 1
5 b 2 NA 1
6 b 3 1 1
7 c 1 14 14
8 c 2 14 14
9 c 3 14 14
note that you can replace na.omit
with max(size, na.rm=TRUE)
if you are looking for maximum for example.
Upvotes: 3
Reputation: 2715
How about this with base R:
vals <- unique(na.omit(data[, c("ID", "size")]))
data$size <- vals$size[match(data$ID, vals$ID)]
ID period size
1 a 1 3
2 a 2 3
3 a 3 3
4 b 1 1
5 b 2 1
6 b 3 1
7 c 1 14
8 c 2 14
9 c 3 14
Upvotes: 3