Maximilian
Maximilian

Reputation: 121

Replace loop with more efficient solution

So my Data looks like this:

test <- structure(list(value = c(0, 781, 1109, 57, 250, 541, 533, 320, 
322, 1033, 291, 2213, 1845, 618, 271, 525, 88, 1354, 217, 820, 
786, 119, 41, 316, 153, 378, 172, 615, 383, 168, 1448, 824, 85, 
224310, 1186, 1488, 244, 368, 133, 488, 118, 4505, 1411, 649, 
690, 548, 226, 393, 1042, 92, 521, 212, 1015, 380, 2944, 54376, 
1396, 429, 2725, 171, 1874, 87, 547, 488, 140, 169, 237, 1749, 
1144, 156, 843, 116, 313, 601, 679, 464, 1092, 178, 28, 57, 550, 
498, 64, 48143, 352, 4100, 232, 1936, 189, 940, 180, 1051, 2917, 
2397, 229, 802, 540, 297, 505, 1649), count = c(1L, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2L, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, 3L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4L, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
)), row.names = c(NA, -100L), class = c("tbl_df", "tbl", "data.frame"
))

column valuehas some random values and column countis mostly filled with NAs. What I need in the end is that every NA in count should be the same as the last one that was not NA. So the first couple of rows should be count == 1 and as soon as count changes to 2 it should be count == 2. So far I am using a loop

for (i in 1:length(test$value))
{
  if(isTRUE(is.na(test$count[i]))){
    test$count[i] <- test$count[i-1]
  }
}

However, this takes forever! Can anyone think of a more efficient way to get the same result as the loop? This would help me out a lot! Thanks in advance!

Upvotes: 0

Views: 70

Answers (3)

akrun
akrun

Reputation: 886938

We can also use

library(zoo)
transform(test, count = na.locf0(count))

Or using data.table nafill for an efficient version

 library(data.table)
 setDT(test)[, count:= nafill(count, type = 'locf')]

-output

test
#      value count
#  1:      0     1
#  2:    781     1
#  3:   1109     1
#  4:     57     1
#  5:    250     1
#  6:    541     1
#  7:    533     1
#  8:    320     1
#  9:    322     1
# 10:   1033     1
# 11:    291     1
# 12:   2213     1
# 13:   1845     1
# 14:    618     1
# ..

Upvotes: 1

Duck
Duck

Reputation: 39585

You can also use na.locf() from zoo:

library(zoo)
#Code
test$count <- na.locf(test$count)

Output:

# A tibble: 100 x 2
   value count
   <dbl> <int>
 1     0     1
 2   781     1
 3  1109     1
 4    57     1
 5   250     1
 6   541     1
 7   533     1
 8   320     1
 9   322     1
10  1033     1
# ... with 90 more rows

Upvotes: 2

Allan Cameron
Allan Cameron

Reputation: 173793

You can use fill from the tidyr package to do exactly this:

tidyr::fill(test, count)
#> # A tibble: 100 x 2
#>    value count
#>    <dbl> <int>
#>  1     0     1
#>  2   781     1
#>  3  1109     1
#>  4    57     1
#>  5   250     1
#>  6   541     1
#>  7   533     1
#>  8   320     1
#>  9   322     1
#> 10  1033     1
#> # ... with 90 more rows

Upvotes: 2

Related Questions