jorginho97
jorginho97

Reputation: 43

Replace going on NA values with sum of another column

I am trying to replace all going on NA values with sum of values from another column, but I'm a little confused.

How the data looks like

df
#     Distance Distance2   
# 1       160    8    
# 2       20     NA   
# 3       30     15    
# 4       100    11
# 5       35     NA    
# 6       42     NA    
# 7       10     NA     
# 8       10     2   
# 9       9      NA     
# 10      20     NA        

And am looking to get a result like this

df
#     Distance Distance2   
# 1       160    8    
# 2       20     20
# 3       30     15    
# 4       100    11
# 5       35     87    
# 6       42     87
# 7       10     87
# 8       10     2   
# 9       9      29  
# 10      20     29   

Thanks in advance for your help

Upvotes: 1

Views: 257

Answers (3)

akrun
akrun

Reputation: 887048

We can use fcoalesce

library(data.table)
library(zoo)
setDT(df)[, Distance2 := fcoalesce(Distance2, na.aggregate(Distance,  FUN = sum)),
          rleid(Distance2)]

data

df <- structure(list(Distance = c(160L, 20L, 30L, 100L, 35L, 42L, 10L, 
10L, 9L, 20L), Distance2 = c(8L, NA, 15L, 11L, NA, NA, NA, 2L, 
NA, NA)), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10"))

Upvotes: 0

Sotos
Sotos

Reputation: 51582

You can group by consecutive NAs and replace with the sum, i.e.

library(dplyr)

df %>% 
 group_by(grp = cumsum(c(TRUE, diff(is.na(df$Distance2)) != 0))) %>% 
 mutate(Distance2 = replace(Distance2, is.na(Distance2), sum(Distance)))


# A tibble: 10 x 3
# Groups:   grp [6]
   Distance Distance2   grp
      <int>     <int> <int>
 1      160         8     1
 2       20        20     2
 3       30        15     3
 4      100        11     3
 5       35        87     4
 6       42        87     4
 7       10        87     4
 8       10         2     5
 9        9        29     6
10       20        29     6

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388907

We can use rleid to create groups and replace NA with sum of Distance values.

library(data.table)

setDT(df)[, Distance_new := replace(Distance2, is.na(Distance2), 
            sum(Distance)), rleid(Distance2)]

df
#    Distance Distance2 Distance_new
# 1:      160         8            8
# 2:       20        NA           20
# 3:       30        15           15
# 4:      100        11           11
# 5:       35        NA           87
# 6:       42        NA           87
# 7:       10        NA           87
# 8:       10         2            2
# 9:        9        NA           29
#10:       20        NA           29

We can also use this in dplyr :

library(dplyr)
df %>%
  group_by(gr = rleid(Distance2)) %>%
  mutate(Distance_new = replace(Distance2, is.na(Distance2), sum(Distance)))

data

df <- structure(list(Distance = c(160L, 20L, 30L, 100L, 35L, 42L, 10L, 
10L, 9L, 20L), Distance2 = c(8L, NA, 15L, 11L, NA, NA, NA, 2L, 
NA, NA)), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10"))

Upvotes: 3

Related Questions