Reputation: 653
Let's assume I have a following data:
library(dplyr)
mydata <- expand_grid(
sex = c("male","female"),
age = 20:30,
employed = c(T,F),
value1 = (1:5)^2,
value2 = sqrt(1:5)) %>%
mutate(row_num = row_number()) %>%
select(row_num, everything())
Which would look like this:
# A tibble: 1,100 × 6
row_num sex age employed value1 value2
<int> <chr> <int> <lgl> <dbl> <dbl>
1 1 male 20 TRUE 1 1
2 2 male 20 TRUE 1 1.41
3 3 male 20 TRUE 1 1.73
4 4 male 20 TRUE 1 2
5 5 male 20 TRUE 1 2.24
6 6 male 20 TRUE 4 1
7 7 male 20 TRUE 4 1.41
8 8 male 20 TRUE 4 1.73
9 9 male 20 TRUE 4 2
10 10 male 20 TRUE 4 2.24
Let's now assume the row number three and eight went missing for some reason, i.e.
mydata %>% slice(-3,-8)
# A tibble: 1,098 × 6
row_num sex age employed value1 value2
<int> <chr> <int> <lgl> <dbl> <dbl>
1 1 male 20 TRUE 1 1
2 2 male 20 TRUE 1 1.41
3 4 male 20 TRUE 1 2
4 5 male 20 TRUE 1 2.24
5 6 male 20 TRUE 4 1
6 7 male 20 TRUE 4 1.41
7 9 male 20 TRUE 4 2
8 10 male 20 TRUE 4 2.24
9 11 male 20 TRUE 9 1
10 12 male 20 TRUE 9 1.41
...but there could be missing rows in other categories (age, sex, employed) as well.
How could I linearly interpolate the missing rows (namely values for columns value1 and value2) and make mydata complete again? In other words I would like to use the information from rows 2 and 4 for row 3 AND information from rows 7 and 9 for row 8, and calculate the mean.
(I am pretty much aware that the logic behind columns value1 and value2 was not initially linear at all.)
Upvotes: 0
Views: 43