Reputation: 491
I have previously posted a question on subsetting columns from row values on GIS StackExchange: here.
In short, I would like to set data to NA
, if the column name (e.g. 100
) is less than the row value of s_mean
(e.g. value is 101
).
It worked for specific applications but now it does not work, and I get the following error:
Error: Can't subset columns that don't exist.
x Locations 304, 303, 302, 301, 300, etc. don't exist.
i There are only 197 columns.
Run `rlang::last_error()` to see where the error occurred.
Here is the data:
# A tibble: 2,937 x 197
ID doy FireID Year sE NAME L1NAME ID_2 area s_count s_mean s_median s_stdev s_min doydiff ID_E5 32 33 34 35
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2246 173 30048 2015 0 A T 30048 3.86e6 0 100 0 0 0 73 56 267. 265. 264. 265.
2 2275 174 30076 2015 0 A T 30076 2.15e6 0 100 0 0 0 74 533 266. 266. 263. 264.
3 704 294 28542 2015 1381 A T 28542 6.44e5 0 100 0 0 0 194 562 277. 277. 278. 279.
4 711 110 28549 2015 0 NA NA 28549 2.15e5 0 101 0 0 0 9 569 262. 264. 260. 262.
5 690 161 28528 2015 232 A T 28528 4.29e5 0 101 0 0 0 60 580 280. 279. 280. 279.
6 692 331 28530 2015 0 M M 28530 2.15e5 0 101 0 0 0 130 582 280. 279. 281. 280.
7 667 47 28506 2015 232 M M 28506 2.79e6 0 10 0 0 0 37 589 280. 282. 281. 280.
8 672 188 28511 2015 0 NA NA 28511 2.79e6 0 101 0 0 0 87 594 254. 261. 259. 254.
9 657 171 28496 2015 578 NA NA 28496 8.59e5 0 101 0 0 0 170 611 256. 263. 260. 254.
10 635 301 28474 2015 1084 M M 28474 1.50e6 0 101 0 0 0 200 621 282. 282. 282. 281.
The data columns continue until columns name 212
. It is not shown here.
Here is the script:
polydata = read_csv("path/E15.csv")
polydata$s_mean <- round(polydata$s_mean)
polydata <- polydata[order(polydata$s_mean),]
# slice each row, and put each slice in a list
df_sub = lapply(1:nrow(polydata),
function(x){
polydata[x,c(1,10,polydata$s_mean[x]:187+10)] # + 10 because of the offset: doy_columns start at 11
})
Why do I get an error that I return too many columns when I specify 187+10 as the subsetting parameter?
What should be changed?
I eventually want this to be the outcome (compare the column names to s_mean
to better understand the desired output):
ID s_mean 32 33 34 35 36 ... 212
1 30 267 278 270 269 267 ... 298
2 100 NA NA NA NA NA ... 298
3 35 NA NA NA 242 246 ... 298
Upvotes: 0
Views: 924
Reputation: 29109
We can use across
from dplyr
and refer to column names using cur_column
. From there, we can use an ifelse
to replace the data with NA
if the column name is less than s_mean
. I created a toy dataset to illustrate the solution which can be found at the end of this post.
library(dplyr)
pdat1 %>%
mutate(across(`32`:`35`,
~ifelse(s_mean > as.numeric(cur_column()), NA, .)))
#> ID s_mean 32 33 34 35
#> 1 2246 30 267 265 264 265
#> 2 2275 100 NA NA NA NA
#> 3 704 100 NA NA NA NA
#> 4 711 34 NA NA 260 262
#> 5 690 101 NA NA NA NA
#> 6 692 101 NA NA NA NA
#> 7 667 10 280 282 281 280
#> 8 672 101 NA NA NA NA
#> 9 657 101 NA NA NA NA
#> 10 635 101 NA NA NA NA
pdat1 <- structure(list(ID = c(2246L, 2275L, 704L, 711L, 690L, 692L, 667L, 672L,
657L, 635L),
s_mean = c(30L, 100L, 100L, 34L, 101L, 101L, 10L, 101L,
101L, 101L),
`32` = c(267, 266, 277, 262, 280, 280, 280, 254, 256, 282),
`33` = c(265, 266, 277, 264, 279, 279, 282, 261, 263, 282),
`34` = c(264, 263, 278, 260, 280, 281, 281, 259, 260, 282),
`35` = c(265, 264, 279, 262, 279, 280, 280, 254, 254, 281)),
class = "data.frame",
row.names = c("1", "2", "3", "4","5", "6", "7", "8", "9", "10"))
#> ID s_mean 32 33 34 35
#> 1 2246 30 267 265 264 265
#> 2 2275 100 266 266 263 264
#> 3 704 100 277 277 278 279
#> 4 711 34 262 264 260 262
#> 5 690 101 280 279 280 279
#> 6 692 101 280 279 281 280
#> 7 667 10 280 282 281 280
#> 8 672 101 254 261 259 254
#> 9 657 101 256 263 260 254
#> 10 635 101 282 282 282 281
Upvotes: 1