markhogue
markhogue

Reputation: 1179

Add column to data frame with sequence depending on other column

I have two columns of data like this:

enter image description here

I want to add a column or modify the second column resulting in a sequence of integers starting with 1, wherever the 1 already appears. Result changes to:

enter image description here

I can do this with a loop, but what is the "right" R way of doing it?

Here's my loop:

for(i in 1:length(df2$col2)) {
  df2$col3[i] <- ifelse(df2$col2[i] == 1, 1, df2$col3[i - 1] + 1)
  if(is.na(df2$col2[i])) df2$col3[i] <- df2$col3[i - 1] + 1
}

Here is a sample data set with 20 rows:

478.69, 320.45, 503.7, 609.3, 478.19, 419.633683050051, 552.939975773916, 
785.119385505095, 18.2542654918507, 98.6469651805237, 132.587260054424, 
697.119552921504, 512.560374778695, 916.425200179219, 14.3385051051155
), col2 = c(1, NA, 1, NA, NA, 1, NA, 1, NA, NA, NA, NA, 1, NA, 
NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-20L))

Upvotes: 0

Views: 347

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 173803

I don't know if this is the way to do it, but it's one way:

df$col3 <- unlist(sapply(diff(c(which(!is.na(df$col2)), nrow(df) + 1)), seq))

df
#>         col1 col2 col3
#> 1  478.69000    1    1
#> 2  320.45000   NA    2
#> 3  503.70000    1    1
#> 4  609.30000   NA    2
#> 5  478.19000   NA    3
#> 6  478.69000    1    1
#> 7  320.45000   NA    2
#> 8  503.70000    1    1
#> 9  609.30000   NA    2
#> 10 478.19000   NA    3
#> 11 419.63368   NA    4
#> 12 552.93998   NA    5
#> 13 785.11939    1    1
#> 14  18.25427   NA    2
#> 15  98.64697   NA    3
#> 16 132.58726   NA    4
#> 17 697.11955   NA    5
#> 18 512.56037   NA    6
#> 19 916.42520   NA    7
#> 20  14.33851   NA    8

Note that the first 5 values of col1 were missing from your dput, so I added the second 5 numbers twice - they're not relevant to the question anyway.


Data

df <- structure(list(col1 = c(478.69, 320.45, 503.7, 609.3, 478.19,
  478.69, 320.45, 503.7, 609.3, 478.19, 419.633683050051, 552.939975773916, 
785.119385505095, 18.2542654918507, 98.6469651805237, 132.587260054424, 
697.119552921504, 512.560374778695, 916.425200179219, 14.3385051051155
), col2 = c(1, NA, 1, NA, NA, 1, NA, 1, NA, NA, NA, NA, 1, NA, 
NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-20L))

df
#>         col1 col2
#> 1  478.69000    1
#> 2  320.45000   NA
#> 3  503.70000    1
#> 4  609.30000   NA
#> 5  478.19000   NA
#> 6  478.69000    1
#> 7  320.45000   NA
#> 8  503.70000    1
#> 9  609.30000   NA
#> 10 478.19000   NA
#> 11 419.63368   NA
#> 12 552.93998   NA
#> 13 785.11939    1
#> 14  18.25427   NA
#> 15  98.64697   NA
#> 16 132.58726   NA
#> 17 697.11955   NA
#> 18 512.56037   NA
#> 19 916.42520   NA
#> 20  14.33851   NA

Upvotes: 1

Related Questions