Reputation: 257
I want to replace NA with group value (not mean or median, since some columns are character or factor) and I want to do this for a list of columns using the same group for all.
Sample data:
ID <- c(1,1,1,2,2,2,3,3)
V1 <- c(NA,"HEJ",NA,"BOS","BOS",NA,"IB","IB")
V2 <- c(700,700,NA,3000,NA,NA,NA,1000)
V3 <- c(NA,NA,3,1,1,1,2,NA)
V4 <- c(NA,NA,NA,NA,NA,"LA",NA,"FE")
mydf <- data.frame(ID,V1,V2,V3,V4)
> mydf
ID V1 V2 V3 V4
1 1 <NA> 700 NA <NA>
2 1 HEJ 700 NA <NA>
3 1 <NA> NA 3 <NA>
4 2 BOS 3000 1 <NA>
5 2 BOS NA 1 <NA>
6 2 <NA> NA 1 LA
7 3 IB NA 2 <NA>
8 3 IB 1000 NA FE
So I know if I only wanted to do this for one column I would:
setDT(mydf)[, V1:=
V1[!is.na(V1)][1L],
by = ID]
And then get:
> mydf
ID V1 V2 V3 V4
1: 1 HEJ 700 NA <NA>
2: 1 HEJ 700 NA <NA>
3: 1 HEJ NA 3 <NA>
4: 2 BOS 3000 1 <NA>
5: 2 BOS NA 1 <NA>
6: 2 BOS NA 1 LA
7: 3 IB NA 2 <NA>
8: 3 IB 1000 NA FE
But I have a lot of columns, so I need to put it in a loop of some kind.
Desired output:
> mydf
ID V1 V2 V3 V4
1: 1 HEJ 700 3 <NA>
2: 1 HEJ 700 3 <NA>
3: 1 HEJ 700 3 <NA>
4: 2 BOS 3000 1 LA
5: 2 BOS 3000 1 LA
6: 2 BOS 3000 1 LA
7: 3 IB 1000 2 FE
8: 3 IB 1000 2 FE
And this is what I've tried without luck:
names <- colnames(mydf[,-c(1)])
for(j in seq_along(nm1)){
set(mydf,
i = which(is.na(mydf[[names[j]]])),
j = names[j],
value = mydf[[names[j]]][is.na(mydf[[names[j]]])])
}
And I also tried this:
mydf[,-c(1] <- lapply(mydf, function(x)
replace(x, is.na(x), x[!is.na(x)]))
> mydf
ID V1 V2 V3 V4
1 1 1 HEJ 700 3
2 1 1 HEJ 700 1
3 1 1 BOS 700 3
4 2 2 BOS 3000 1
5 2 2 BOS 700 1
6 2 2 BOS 3000 1
7 3 3 IB 1000 2
8 3 3 IB 1000 1
I'm sorry if I'm asking a question that's already out there but I wasn't able to find it. I hope somebody can help me cleaning up my messy data :)
Upvotes: 1
Views: 64
Reputation: 887691
We can use na.locf
from zoo
library(data.table)
setDT(df1)[, na.locf(.SD), by = ID, .SDcols = V2:V4]
Upvotes: 0
Reputation: 389175
We can use .SDcols
to apply the function to multiple columns.
library(data.table)
cols <- names(mydf[-1])
setDT(mydf)
mydf[, (cols):= lapply(.SD, function(x)
replace(x, is.na(x), x[!is.na(x)][1])),.SDcols = cols, by = ID]
mydf
# ID V1 V2 V3 V4
#1: 1 HEJ 700 3 <NA>
#2: 1 HEJ 700 3 <NA>
#3: 1 HEJ 700 3 <NA>
#4: 2 BOS 3000 1 LA
#5: 2 BOS 3000 1 LA
#6: 2 BOS 3000 1 LA
#7: 3 IB 1000 2 FE
#8: 3 IB 1000 2 FE
Upvotes: 1
Reputation: 40161
One dplyr
and tidyr
possibilty could be:
mydf %>%
group_by(ID) %>%
fill(-ID, .direction = "downup")
ID V1 V2 V3 V4
<dbl> <fct> <dbl> <dbl> <fct>
1 1 HEJ 700 3 <NA>
2 1 HEJ 700 3 <NA>
3 1 HEJ 700 3 <NA>
4 2 BOS 3000 1 LA
5 2 BOS 3000 1 LA
6 2 BOS 3000 1 LA
7 3 IB 1000 2 FE
8 3 IB 1000 2 FE
Upvotes: 1