Shruthi Patil
Shruthi Patil

Reputation: 87

Replace NAs of a particular variable with a particular value

I have a dataset which looks like

Date       temperature   sysid      appliance_name 
01.03.18   13            abcd123    silverstar 320
02.03.18   15            abcd123    silverstar 320
03.03.18   17            abcd123    silverstar 320
04.03.18   15            abcd123    silverstar 320
01.03.18   18            bcfw5678   silverstar erp30i
02.03.18   14            bcfw5678   NA
03.03.18   18            bcfw5678   silverstar erp30i
04.03.18   10            bcfw5678   silverstar erp30i
01.03.18   16            ygft3456   NA
02.03.18   13            ygft3456   silverstar 30i
03.03.18   12            ygft3456   silverstar 30i
04.03.18   18            ygft3456   silverstar 30i

It is clear for me to fill the NAs with the same appliance_name as the appliance_name of a record with same sysid. For example, the record 02.03.18 14 bcfw5678 NA should have the NA replaced with silverstar erp30i because from the other records it is clear that this particular sysid belongs to silverstar erp30i. I cannot use do.locf because it is possible that the previous record belongs to another appliance group.

What are the ways to treat NAs in this case?

Upvotes: 1

Views: 45

Answers (2)

Alan Gómez
Alan Gómez

Reputation: 378

A Base R Solution is:

DATA

df <- structure(list(Date = c("01.03.18", "02.03.18", "03.03.18", "04.03.18", 
"01.03.18", "02.03.18", "03.03.18", "04.03.18", "01.03.18", "02.03.18", 
"03.03.18", "04.03.18"), temperature = c(13L, 15L, 17L, 15L, 
18L, 14L, 18L, 10L, 16L, 13L, 12L, 18L), sysid = c("abcd123", 
"abcd123", "abcd123", "abcd123", "bcfw5678", "bcfw5678", "bcfw5678", 
"bcfw5678", "ygft3456", "ygft3456", "ygft3456", "ygft3456"), 
    appliance_name = c("silverstar_320", "silverstar_320", "silverstar_320", 
    "silverstar_320", "silverstar_erp30i", NA, "silverstar_erp30i", 
    "silverstar_erp30i", NA, "silverstar_30i", "silverstar_30i", 
    "silverstar_30i")), class = "data.frame", row.names = c(NA, 
-12L))

PROCEDURE

list1 <- by(df[,4], df[,3], function(x) unique(na.omit(x)))
df2 <- data.frame(names(list1), as.vector(list1))

df[is.na(df[,4]) == T, 4] <- df2[df2[,1] %in% df[is.na(df[,4]) == T,3], 2]

OUTPUT

       Date temperature    sysid    appliance_name
1  01.03.18          13  abcd123    silverstar_320
2  02.03.18          15  abcd123    silverstar_320
3  03.03.18          17  abcd123    silverstar_320
4  04.03.18          15  abcd123    silverstar_320
5  01.03.18          18 bcfw5678 silverstar_erp30i
6  02.03.18          14 bcfw5678 silverstar_erp30i
7  03.03.18          18 bcfw5678 silverstar_erp30i
8  04.03.18          10 bcfw5678 silverstar_erp30i
9  01.03.18          16 ygft3456    silverstar_30i
10 02.03.18          13 ygft3456    silverstar_30i
11 03.03.18          12 ygft3456    silverstar_30i
12 04.03.18          18 ygft3456    silverstar_30i

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388962

One way to handle this is by replacing the NA's by first non-NA value in each group (sysid).

library(dplyr)

df %>%
  group_by(sysid) %>%
  mutate(appliance_name = replace(appliance_name, is.na(appliance_name), 
                        appliance_name[which.max(!is.na(appliance_name))]))



#    Date     temperature sysid    appliance_name  
#   <fct>          <int> <fct>    <fct>           
# 1 01.03.18          13 abcd123  silverstar320   
# 2 02.03.18          15 abcd123  silverstar320   
# 3 03.03.18          17 abcd123  silverstar320   
# 4 04.03.18          15 abcd123  silverstar320   
# 5 01.03.18          18 bcfw5678 silverstarerp30i
# 6 02.03.18          14 bcfw5678 silverstarerp30i
# 7 03.03.18          18 bcfw5678 silverstarerp30i
# 8 04.03.18          10 bcfw5678 silverstarerp30i
# 9 01.03.18          16 ygft3456 silverstar30i   
#10 02.03.18          13 ygft3456 silverstar30i   
#11 03.03.18          12 ygft3456 silverstar30i   
#12 04.03.18          18 ygft3456 silverstar30i   

Upvotes: 1

Related Questions