Reputation: 45
For simplicity reasons, let's assume I have two columns. First: ID (string of codes such as AA23, BA53, NA, etc.) Second: Age (18, 32, 55, 23, etc.)
And IDs sometimes repeat (i.e., one person - AA23 filled the survey in two days, but only on the first day was asked how old he is, but during the second and third day not).
I want to copy paste values from the Age column based on the ID, so that I have a 'long format' of the dataframe.
dput(data):
structure(list(Code = c("MW68", "AW80", "EW40", "BW60", "Wn36",
"ZK45", "SI55", "MW68", "EW40", "DC06", NA, "IW28"), Age = c("52",
"26", "34", "26", "20", "35", NA, NA, NA, NA, NA, NA)), row.names = c(5L,
6L, 7L, 8L, 9L, 10L, 400L, 401L, 402L, 403L, 404L, 405L), class = "data.frame")
Input:
ID Age
AA23 18
BA53 32
AC13 55
AA23 NA
BA53 NA
AC13 NA
NA 23
AA23 NA
(the trick is that sometimes ID is NA)
And the desired output:
ID Age
AA23 18
BA53 32
AC13 55
AA23 18
BA53 32
AC13 55
NA 23
AA23 18
Thank you in advance!
Upvotes: 1
Views: 1650
Reputation: 21918
You can also use the function coalesce
which finds the first NA
value and replace it with the value you define, here we would like it to be the first value of every Age
variable (grouping variable):
library(dplyr)
df %>%
group_by(Code) %>%
mutate(across(Age, ~ coalesce(.x, first(.x))))
# A tibble: 12 x 2
# Groups: Code [10]
Code Age
<chr> <chr>
1 MW68 52
2 AW80 26
3 EW40 34
4 BW60 26
5 Wn36 20
6 ZK45 35
7 SI55 NA
8 MW68 52
9 EW40 34
10 DC06 NA
11 NA NA
12 IW28 NA
Upvotes: 1
Reputation: 21400
Here's a solution based on zoo
's function na.locf
("in the case of NA, last observation carried forward"): first you group by Code
then you mutate column
Ageusing
ifelse and carrying the last non-
NA` observation forward:
library(zoo)
data %>%
group_by(Code) %>%
mutate(Age = ifelse(is.na(Age), na.locf(Age), Age))
# A tibble: 12 x 2
# Groups: Code [10]
Code Age
<chr> <chr>
1 MW68 52
2 AW80 26
3 EW40 34
4 BW60 26
5 Wn36 20
6 ZK45 35
7 SI55 NA
8 MW68 52 # <- value `carried forward`
9 EW40 34 # <- value `carried forward`
10 DC06 NA
11 NA NA
12 IW28 NA
Upvotes: 0
Reputation: 1175
I'm not quite sure if I understood correctly what you want to do, but this code here should look where Age is NA
and fill in the mean of the Age from the other rows with the same entry in Code. Obviously, this will fail if there are values for Code where no Age value exists anywhere in the table. If there are various values for Age in different rows with the same Code, it will fill in the mean in this example, since you didn't specify what to do in such a case.
for(i in 1:nrow(data)){
if(!is.na(data$Code[i])){
if(is.na(data$Age[i])){
data$Age[i] <- mean(data$Age[data$Code == data$Code[i]], na.rm = TRUE)
}
}
}
This skips rows with NA
in the Code column.
Upvotes: 1