Reputation: 335

turn many columns to NA, based on a certain variable

I would like to turn several columns after a certain one to NA, depending on the value of another column.

Here the initial dataset:

    x.1 x.2 x.3 x.4 x.5 x.6 x.7 t
1   a   b   d   a   a   d   b   4
2   b   d   d   a   b   a   b   3
3   b   a   d   a   a   c   a   4
4   c   b   c   a   a   c   a   1

reproducible with:

data <- data.frame(x.1 = c("a","b","b","c"), x.2 = c("b","d","a","b"), x.3 = c("d","d","d","c"), x.4 = c("a","a","a","a"), x.5 = c("a","b","a","a"),x.6 = c("d","a","c","c"), x.7 = c("b","b","a","a"), t = c(4,3,4,1))

Here the dataset I would like to obtain, with NA for each x-column, after the corresponding t. I.e. in both rows 1 and 3, t is equal to 4, so nothing will change in x.1, x.2, x.3, x.4, and all the x after that point (x.5, x.6, x.7) become NA. In row 2 t is 3, so from x.4 columns will turn into NA, etc.

    x.1 x.2 x.3 x.4 x.5 x.6 x.7 t
1   a   b   d   a   NA  NA  NA  4
2   b   d   d   NA  NA  NA  NA  3
3   b   a   d   a   NA  NA  NA  4
4   c   NA  NA  NA  NA  NA  NA  1

I will have a large dataset with 156 x-columns, so I'd a quick way to do it.

Upvotes: 2

Answers (6)

Anoushiravan R

Reputation: 21938

We can also use pmap function from purrr. It should be noted that ..8 refers to the value of 8th variable t in this data set in every row.

library(dplyr)
library(purrr)

data %>%
  pmap_dfr(., ~ {x <- c(...); 
  x[(..8+1):(length(x)-1)] <- NA;
  x})

# A tibble: 4 x 8
  x.1   x.2   x.3   x.4   x.5   x.6   x.7       t
  <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 a     b     d     a     NA    NA    NA        4
2 b     d     d     NA    NA    NA    NA        3
3 b     a     d     a     NA    NA    NA        4
4 c     NA    NA    NA    NA    NA    NA        1

Upvotes: 2

GKi

Reputation: 39737

Extracting the column names starting with x. using startsWith and getting the number in the column name using substring and convert it to as.integer. Then compare data$t with the column names stored in j if they are < using outer and overwrite the values with NA where the comparison gives TRUE.

i <- which(startsWith(names(data), "x."))     #Get columns starting with x.
j <- as.integer(substring(names(data)[i], 3)) #get the numbers in their names
data[i][outer(data$t, j, `<`)] <- NA          #Overwrite the values with NA
data
#  x.1  x.2  x.3  x.4  x.5  x.6  x.7 t
#1   a    b    d    a <NA> <NA> <NA> 4
#2   b    d    d <NA> <NA> <NA> <NA> 3
#3   b    a    d    a <NA> <NA> <NA> 4
#4   c <NA> <NA> <NA> <NA> <NA> <NA> 1

And in case for the structure of the given data the following will also come to the desired result:

data[1:7][outer(data$t, 1:7, `<`)] <- NA

Upvotes: 0

Ronak Shah

Reputation: 389325

Here's one base R option -

#count length of 'x' columns
cols <- length(grep('x', names(data)))
#Create row index
rowindex <- rep(1:nrow(data), cols - data$t)
#Create column index
colindex <- unlist(sapply(data$t + 1, seq, cols))
#Replace with NA
data[cbind(rowindex, colindex)] <- NA
data

#  x.1  x.2  x.3  x.4  x.5  x.6  x.7 t
#1   a    b    d    a <NA> <NA> <NA> 4
#2   b    d    d <NA> <NA> <NA> <NA> 3
#3   b    a    d    a <NA> <NA> <NA> 4
#4   c <NA> <NA> <NA> <NA> <NA> <NA> 1

Upvotes: 0

elielink

Reputation: 1202

Using a for loop method:

for(i in 1:nrow(data)){data[i, 1:7]= c(data[i,1:data[i,'t']],rep(NA,7-data[i,'t']))    }

Result:

  x.1  x.2  x.3  x.4  x.5  x.6  x.7 t
1   a    b    d    a <NA> <NA> <NA> 4
2   b    d    d <NA> <NA> <NA> <NA> 3
3   b    a    d    a <NA> <NA> <NA> 4
4   c <NA> <NA> <NA> <NA> <NA> <NA> 1

Upvotes: 0

zerz

Reputation: 170

Is iterating through all the rows too slow? There must be a faster way than this, but it does the job

for (r in 1:nrow(data))  data[r, (data$t[r] + 1): (ncol(data)-1)] <- NA

It only works if all the data$t values make sense.

Upvotes: 0

AnilGoyal

Reputation: 26238

data <- data.frame(x.1 = c("a","b","b","c"), x.2 = c("b","d","a","b"), x.3 = c("d","d","d","c"), x.4 = c("a","a","a","a"), x.5 = c("a","b","a","a"),x.6 = c("d","a","c","c"), x.7 = c("b","b","a","a"), t = c(4,3,4,1))
data
#>   x.1 x.2 x.3 x.4 x.5 x.6 x.7 t
#> 1   a   b   d   a   a   d   b 4
#> 2   b   d   d   a   b   a   b 3
#> 3   b   a   d   a   a   c   a 4
#> 4   c   b   c   a   a   c   a 1

library(tidyverse)
data %>%
  mutate(across(starts_with('x'), ~ ifelse(as.numeric(str_remove(cur_column(), 'x.')) > t, NA, .)))
#>   x.1  x.2  x.3  x.4 x.5 x.6 x.7 t
#> 1   a    b    d    a  NA  NA  NA 4
#> 2   b    d    d <NA>  NA  NA  NA 3
#> 3   b    a    d    a  NA  NA  NA 4
#> 4   c <NA> <NA> <NA>  NA  NA  NA 1

^{Created on 2021-06-07 by the reprex package (v2.0.0)}

Upvotes: 2

turn many columns to NA, based on a certain variable

Answers (6)

Related Questions