Andres Mora
Andres Mora

Reputation: 1106

How to fill NAs based on a grouped variable?

My test data:

tableid  id    name    weight   weightdate
1        1    david    100      01/01/2020
2        1    david    100      01/01/2020
3        1    david    NA       NA
4        2    anne     NA       NA
5        3    peter    150      02/10/2020
6        3    peter    150      02/10/2020

I have some records of a same person (in this case david), which have both valid (100 and 01/01/2020) and NA data in weight and weightdate. Im looking to normalize the NA by replacing them with valid data from that person. For the case of anne who doesnt have any valid data, i would leave it as is. I was thinking about using fill.

My desired output:

tableid  id    name    weight   weightdate
1        1    david    100      01/01/2020
2        1    david    100      01/01/2020
3        1    david    100      01/01/2020
4        2    anne     NA       NA
5        3    peter    150      02/10/2020
6        3    peter    150      02/10/2020

Upvotes: 4

Views: 58

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 101064

A data.table option

> setDT(df)[, lapply(.SD, function(x) unique(na.omit(x))), name]
    name tableid id weight weightdate
1: david       1  1    100 01/01/2020
2: david       2  1    100 01/01/2020
3: david       3  1    100 01/01/2020
4:  anne       4  2     NA       <NA>
5: peter       5  3    150 02/10/2020
6: peter       6  3    150 02/10/2020

Upvotes: 1

Andre Wildberg
Andre Wildberg

Reputation: 19088

You can use fill() function from tidyr package:

library(tidyr)

df1 %>% group_by(name) %>% fill(weight, weightdate) %>% ungroup
# A tibble: 6 x 5
  tableid    id name  weight weightdate
    <int> <int> <chr>  <int> <chr>     
1       1     1 david    100 01/01/2020
2       2     1 david    100 01/01/2020
3       3     1 david    100 01/01/2020
4       4     2 anne      NA NA        
5       5     3 peter    150 02/10/2020
6       6     3 peter    150 02/10/2020

Data

df1 <- structure(list(tableid = 1:6, id = c(1L, 1L, 1L, 2L, 3L, 3L), 
    name = c("david", "david", "david", "anne", "peter", "peter"
    ), weight = c(100L, 100L, NA, NA, 150L, 150L), weightdate = c("01/01/2020", 
    "01/01/2020", NA, NA, "02/10/2020", "02/10/2020")), class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 3

TarJae
TarJae

Reputation: 78917

This is an alternative. Andre Wildberg already provided my favorite:

library(dplyr)
df %>% 
  group_by(name) %>% 
  mutate(across(everything(), ~ifelse(is.na(.), lag(.),.)))
  tableid    id name  weight weightdate
    <int> <int> <chr>  <int> <chr>     
1       1     1 david    100 01/01/2020
2       2     1 david    100 01/01/2020
3       3     1 david    100 01/01/2020
4       4     2 anne      NA NA        
5       5     3 peter    150 02/10/2020
6       6     3 peter    150 02/10/2020

data:

structure(list(tableid = 1:6, id = c(1L, 1L, 1L, 2L, 3L, 3L), 
name = c("david", "david", "david", "anne", "peter", "peter"
), weight = c(100L, 100L, NA, NA, 150L, 150L), weightdate = c("01/01/2020", 
"01/01/2020", NA, NA, "02/10/2020", "02/10/2020")), class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 3

Related Questions