Reputation: 335
I would like to turn several columns after a certain one to NA, depending on the value of another column.
Here the initial dataset:
x.1 x.2 x.3 x.4 x.5 x.6 x.7 t
1 a b d a a d b 4
2 b d d a b a b 3
3 b a d a a c a 4
4 c b c a a c a 1
reproducible with:
data <- data.frame(x.1 = c("a","b","b","c"), x.2 = c("b","d","a","b"), x.3 = c("d","d","d","c"), x.4 = c("a","a","a","a"), x.5 = c("a","b","a","a"),x.6 = c("d","a","c","c"), x.7 = c("b","b","a","a"), t = c(4,3,4,1))
Here the dataset I would like to obtain, with NA for each x-column, after the corresponding t. I.e. in both rows 1 and 3, t is equal to 4, so nothing will change in x.1, x.2, x.3, x.4, and all the x after that point (x.5, x.6, x.7) become NA. In row 2 t is 3, so from x.4 columns will turn into NA, etc.
x.1 x.2 x.3 x.4 x.5 x.6 x.7 t
1 a b d a NA NA NA 4
2 b d d NA NA NA NA 3
3 b a d a NA NA NA 4
4 c NA NA NA NA NA NA 1
I will have a large dataset with 156 x-columns, so I'd a quick way to do it.
Upvotes: 2
Views: 72
Reputation: 21908
We can also use pmap
function from purrr
. It should be noted that ..8
refers to the value of 8th variable t
in this data set in every row.
library(dplyr)
library(purrr)
data %>%
pmap_dfr(., ~ {x <- c(...);
x[(..8+1):(length(x)-1)] <- NA;
x})
# A tibble: 4 x 8
x.1 x.2 x.3 x.4 x.5 x.6 x.7 t
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 a b d a NA NA NA 4
2 b d d NA NA NA NA 3
3 b a d a NA NA NA 4
4 c NA NA NA NA NA NA 1
Upvotes: 2
Reputation: 39657
Extracting the column names starting with x.
using startsWith
and getting the number in the column name using substring
and convert it to as.integer
. Then compare data$t
with the column names stored in j
if they are <
using outer
and overwrite the values with NA
where the comparison gives TRUE
.
i <- which(startsWith(names(data), "x.")) #Get columns starting with x.
j <- as.integer(substring(names(data)[i], 3)) #get the numbers in their names
data[i][outer(data$t, j, `<`)] <- NA #Overwrite the values with NA
data
# x.1 x.2 x.3 x.4 x.5 x.6 x.7 t
#1 a b d a <NA> <NA> <NA> 4
#2 b d d <NA> <NA> <NA> <NA> 3
#3 b a d a <NA> <NA> <NA> 4
#4 c <NA> <NA> <NA> <NA> <NA> <NA> 1
And in case for the structure of the given data the following will also come to the desired result:
data[1:7][outer(data$t, 1:7, `<`)] <- NA
Upvotes: 0
Reputation: 388982
Here's one base R option -
#count length of 'x' columns
cols <- length(grep('x', names(data)))
#Create row index
rowindex <- rep(1:nrow(data), cols - data$t)
#Create column index
colindex <- unlist(sapply(data$t + 1, seq, cols))
#Replace with NA
data[cbind(rowindex, colindex)] <- NA
data
# x.1 x.2 x.3 x.4 x.5 x.6 x.7 t
#1 a b d a <NA> <NA> <NA> 4
#2 b d d <NA> <NA> <NA> <NA> 3
#3 b a d a <NA> <NA> <NA> 4
#4 c <NA> <NA> <NA> <NA> <NA> <NA> 1
Upvotes: 0
Reputation: 1202
Using a for loop method:
for(i in 1:nrow(data)){data[i, 1:7]= c(data[i,1:data[i,'t']],rep(NA,7-data[i,'t'])) }
Result:
x.1 x.2 x.3 x.4 x.5 x.6 x.7 t
1 a b d a <NA> <NA> <NA> 4
2 b d d <NA> <NA> <NA> <NA> 3
3 b a d a <NA> <NA> <NA> 4
4 c <NA> <NA> <NA> <NA> <NA> <NA> 1
Upvotes: 0
Reputation: 170
Is iterating through all the rows too slow? There must be a faster way than this, but it does the job
for (r in 1:nrow(data)) data[r, (data$t[r] + 1): (ncol(data)-1)] <- NA
It only works if all the data$t
values make sense.
Upvotes: 0
Reputation: 26218
data <- data.frame(x.1 = c("a","b","b","c"), x.2 = c("b","d","a","b"), x.3 = c("d","d","d","c"), x.4 = c("a","a","a","a"), x.5 = c("a","b","a","a"),x.6 = c("d","a","c","c"), x.7 = c("b","b","a","a"), t = c(4,3,4,1))
data
#> x.1 x.2 x.3 x.4 x.5 x.6 x.7 t
#> 1 a b d a a d b 4
#> 2 b d d a b a b 3
#> 3 b a d a a c a 4
#> 4 c b c a a c a 1
library(tidyverse)
data %>%
mutate(across(starts_with('x'), ~ ifelse(as.numeric(str_remove(cur_column(), 'x.')) > t, NA, .)))
#> x.1 x.2 x.3 x.4 x.5 x.6 x.7 t
#> 1 a b d a NA NA NA 4
#> 2 b d d <NA> NA NA NA 3
#> 3 b a d a NA NA NA 4
#> 4 c <NA> <NA> <NA> NA NA NA 1
Created on 2021-06-07 by the reprex package (v2.0.0)
Upvotes: 2