chillos
chillos

Reputation: 45

(R) How to copy paste values from one column based on another column and ID in R

For simplicity reasons, let's assume I have two columns. First: ID (string of codes such as AA23, BA53, NA, etc.) Second: Age (18, 32, 55, 23, etc.)

And IDs sometimes repeat (i.e., one person - AA23 filled the survey in two days, but only on the first day was asked how old he is, but during the second and third day not).

I want to copy paste values from the Age column based on the ID, so that I have a 'long format' of the dataframe.

dput(data):

structure(list(Code = c("MW68", "AW80", "EW40", "BW60", "Wn36", 
"ZK45", "SI55", "MW68", "EW40", "DC06", NA, "IW28"), Age = c("52", 
"26", "34", "26", "20", "35", NA, NA, NA, NA, NA, NA)), row.names = c(5L, 
6L, 7L, 8L, 9L, 10L, 400L, 401L, 402L, 403L, 404L, 405L), class = "data.frame")

Input:

ID   Age
AA23 18
BA53 32
AC13 55
AA23 NA
BA53 NA  
AC13 NA
NA   23
AA23 NA
(the trick is that sometimes ID is NA)

And the desired output:
ID   Age
AA23 18
BA53 32
AC13 55
AA23 18
BA53 32  
AC13 55
NA   23
AA23 18

Thank you in advance!

Upvotes: 1

Views: 1650

Answers (3)

Anoushiravan R
Anoushiravan R

Reputation: 21918

You can also use the function coalesce which finds the first NA value and replace it with the value you define, here we would like it to be the first value of every Age variable (grouping variable):

library(dplyr)

df %>%
  group_by(Code) %>%
  mutate(across(Age, ~ coalesce(.x, first(.x))))

# A tibble: 12 x 2
# Groups:   Code [10]
   Code  Age  
   <chr> <chr>
 1 MW68  52   
 2 AW80  26   
 3 EW40  34   
 4 BW60  26   
 5 Wn36  20   
 6 ZK45  35   
 7 SI55  NA   
 8 MW68  52   
 9 EW40  34   
10 DC06  NA   
11 NA    NA   
12 IW28  NA 

Upvotes: 1

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

Here's a solution based on zoo's function na.locf("in the case of NA, last observation carried forward"): first you group by Codethen you mutate column Ageusingifelse and carrying the last non-NA` observation forward:

library(zoo)
data %>%
  group_by(Code) %>%
  mutate(Age = ifelse(is.na(Age), na.locf(Age), Age))
# A tibble: 12 x 2
# Groups:   Code [10]
   Code  Age  
   <chr> <chr>
 1 MW68  52   
 2 AW80  26   
 3 EW40  34   
 4 BW60  26   
 5 Wn36  20   
 6 ZK45  35   
 7 SI55  NA   
 8 MW68  52   # <- value `carried forward`
 9 EW40  34   # <- value `carried forward`
10 DC06  NA   
11 NA    NA   
12 IW28  NA  

Upvotes: 0

Manuel Popp
Manuel Popp

Reputation: 1175

I'm not quite sure if I understood correctly what you want to do, but this code here should look where Age is NA and fill in the mean of the Age from the other rows with the same entry in Code. Obviously, this will fail if there are values for Code where no Age value exists anywhere in the table. If there are various values for Age in different rows with the same Code, it will fill in the mean in this example, since you didn't specify what to do in such a case.

for(i in 1:nrow(data)){
  if(!is.na(data$Code[i])){
    if(is.na(data$Age[i])){
      data$Age[i] <- mean(data$Age[data$Code == data$Code[i]], na.rm = TRUE)
    }
  }
}

This skips rows with NA in the Code column.

Upvotes: 1

Related Questions