Reputation: 43
I am trying to replace all going on NA values with sum of values from another column, but I'm a little confused.
How the data looks like
df
# Distance Distance2
# 1 160 8
# 2 20 NA
# 3 30 15
# 4 100 11
# 5 35 NA
# 6 42 NA
# 7 10 NA
# 8 10 2
# 9 9 NA
# 10 20 NA
And am looking to get a result like this
df
# Distance Distance2
# 1 160 8
# 2 20 20
# 3 30 15
# 4 100 11
# 5 35 87
# 6 42 87
# 7 10 87
# 8 10 2
# 9 9 29
# 10 20 29
Thanks in advance for your help
Upvotes: 1
Views: 257
Reputation: 887048
We can use fcoalesce
library(data.table)
library(zoo)
setDT(df)[, Distance2 := fcoalesce(Distance2, na.aggregate(Distance, FUN = sum)),
rleid(Distance2)]
df <- structure(list(Distance = c(160L, 20L, 30L, 100L, 35L, 42L, 10L,
10L, 9L, 20L), Distance2 = c(8L, NA, 15L, 11L, NA, NA, NA, 2L,
NA, NA)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10"))
Upvotes: 0
Reputation: 51582
You can group by consecutive NAs and replace with the sum, i.e.
library(dplyr)
df %>%
group_by(grp = cumsum(c(TRUE, diff(is.na(df$Distance2)) != 0))) %>%
mutate(Distance2 = replace(Distance2, is.na(Distance2), sum(Distance)))
# A tibble: 10 x 3
# Groups: grp [6]
Distance Distance2 grp
<int> <int> <int>
1 160 8 1
2 20 20 2
3 30 15 3
4 100 11 3
5 35 87 4
6 42 87 4
7 10 87 4
8 10 2 5
9 9 29 6
10 20 29 6
Upvotes: 2
Reputation: 388907
We can use rleid
to create groups and replace NA
with sum
of Distance
values.
library(data.table)
setDT(df)[, Distance_new := replace(Distance2, is.na(Distance2),
sum(Distance)), rleid(Distance2)]
df
# Distance Distance2 Distance_new
# 1: 160 8 8
# 2: 20 NA 20
# 3: 30 15 15
# 4: 100 11 11
# 5: 35 NA 87
# 6: 42 NA 87
# 7: 10 NA 87
# 8: 10 2 2
# 9: 9 NA 29
#10: 20 NA 29
We can also use this in dplyr
:
library(dplyr)
df %>%
group_by(gr = rleid(Distance2)) %>%
mutate(Distance_new = replace(Distance2, is.na(Distance2), sum(Distance)))
data
df <- structure(list(Distance = c(160L, 20L, 30L, 100L, 35L, 42L, 10L,
10L, 9L, 20L), Distance2 = c(8L, NA, 15L, 11L, NA, NA, NA, 2L,
NA, NA)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10"))
Upvotes: 3