Julius Hess
Julius Hess

Reputation: 13

Creating multiple variables with for-loop and "paste"

library(tidyverse)
library(magrittr)

df <- data.frame(year = c(1977:1981), set852 = c(1,1,0,0,0), set857=c(0,0,1,1,0), set874=c(0,0,0,1,1))

For each variable set852, set857 and so forth (in the real datasets it's a long list) I want to create a variable that indicates whether there is a change in the time series (values would be "start", "end" and "no change"). The additional variables should look like this:

df_final <- data.frame(year = c(1977:1981), c852 = c("start","end","no change","no change","no change"), c857=c("no change","no change","start","end","no change"), c874=c("no change","no change","no change","start","end"))

I tried this within the tidyverse with a for-loop, mutate, paste and case_when:

set_num <- as.integer(str_extract(colnames(df), "[0-9]+"))

for (i in 2:nrow(df))
{
  df %<>% mutate(paste0("c", set_num[[i]]) = case_when(paste("set", set_num[[i]], sep="")==1 & year == 1977 ~ "start",
             paste("set", set_num[[i]], sep="")==1 & lag(paste("set", set_num[[i]], sep=""))==0 ~ "start",
             paste("set", set_num[[i]], sep="")==1 & lead(paste("set", set_num[[i]], sep=""))==0 ~ "end",
TRUE~"no change"))
}

However, the paste-function after mutate is not recognized as a function but as the name of a variable that starts with "paste0("c"....and so forth". How do I get the code to register the paste0-function as a function and not as a string?

Edit: There seems to be confusion about what constitutes a change. A sequence of 1-1-1-0-0 would be start-nochange-end-nochange-nochange

Upvotes: 1

Views: 82

Answers (3)

jay.sf
jay.sf

Reputation: 72919

You could use matrixStats::rowCumsums. The advantage is, row calculations are done in C++ which is much faster. We use modulo %% (length(v) - 1) add one and replace the 0 with length(v) to subset our value vector. Finally we inject an array with the original dimensions into our data frame. Using more interesting data:

> v <- c('end', 'start', 'no change')
> l <- length(v)
> df[-1] <- array(v[
+   replace(
+     matrixStats::rowCumsums(as.matrix(df[-1])) %% (l - 1) + 1, 
+     df[-1] == 0, 
+     l)  
+ ], dim=dim(df[-1]))
> df
  year    set852    set857    set874  set852.1    set853   set8744
1 1977     start no change no change       end no change no change
2 1978     start no change no change       end no change no change
3 1979 no change     start no change no change       end no change
4 1980 no change     start       end no change     start       end
5 1981 no change no change     start no change no change       end

Reviewing other other answers, there seems to be confusion about your logic. I assumed 1 indicates a change, accordingly a row e.g. 0-1-0-1-0 should be "no change"-"start"-"no change"-"end"-"no change".


Data:

> dput(df)
structure(list(year = 1977:1981, set852 = c(1L, 1L, 0L, 0L, 0L
), set857 = c(0L, 0L, 1L, 1L, 0L), set874 = c(0L, 0L, 0L, 1L, 
1L), set852.1 = c(1L, 1L, 0L, 0L, 0L), set853 = c(0L, 0L, 1L, 
1L, 0L), set8744 = c(0L, 0L, 0L, 1L, 1L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

looks like:

> df
  year set852 set857 set874 set852.1 set853 set8744
1 1977      1      0      0        1      0       0
2 1978      1      0      0        1      0       0
3 1979      0      1      0        0      1       0
4 1980      0      1      1        0      1       1
5 1981      0      0      1        0      0       1

Upvotes: 0

I_O
I_O

Reputation: 6911

another approach with base R:

get_states <- \(xs){
  (rle(xs))$lengths |>
            Map(f = \(len) rep('no change', len) |>
                           replace(len, 'end') |>
                           replace(1, 'start')                           
                ) |>
            Reduce(f = c)
}

df_final <-  cbind(df[1],
                   df[-1] |>
                   Map(f = get_states)
                   )
## > df
##   year set852 set857 set874
## 1 1977      1      0      0
## 2 1978      1      0      0
## 3 1979      0      1      0
## 4 1980      0      1      1
## 5 1981      0      0      1
## > df_final
##   year    set852 set857    set874
## 1 1977     start  start     start
## 2 1978       end    end no change
## 3 1979     start  start       end
## 4 1980 no change    end     start
## 5 1981       end  start       end

Upvotes: 0

stefan
stefan

Reputation: 124213

Instead of a for loop you could achieve your desired result using dplyr::across like so:

library(dplyr, warn = FALSE)

df <- data.frame(
  year = c(1977:1981),
  set852 = c(1, 1, 0, 0, 0),
  set857 = c(0, 0, 1, 1, 0),
  set874 = c(0, 0, 0, 1, 1)
)

myfun <- function(.x, year) {
  case_when(
    .x == 1 & year == 1977 ~ "start",
    .x == 1 & lag(.x) == 0 ~ "start",
    .x == 1 & lead(.x) == 0 ~ "end",
    .default = "no change"
  )
}

set_cols <- grep("\\d+$", names(df), value = TRUE)

df |>
  mutate(
    across(all_of(set_cols), ~ myfun(.x, year),
      .names = "{gsub('^.*?(\\\\d+)$', 'c\\\\1', .col)}"
    )
  ) |> 
  select(-all_of(set_cols))
#>   year      c852      c857      c874
#> 1 1977     start no change no change
#> 2 1978       end no change no change
#> 3 1979 no change     start no change
#> 4 1980 no change       end     start
#> 5 1981 no change no change no change

Upvotes: 1

Related Questions