Reputation: 218
I have code I run which is filtering grouped rows so only 1 row is left per group unless multiple rows pass all my filters. This code has already worked fine yesterday. However, today I used downloaded some new packages (that are for analyzing gene enrichment), and since using those my filtering code now gives me a new error:
SD1<- df %>%
group_by(group) %>%
filter(if(n() > 1) {(Score > SD) } else TRUE) %>%
slice_max(count1, n = 1) %>%
slice_max(count2, n = 1)
PPI <- df %>%
group_by(group) %>%
dplyr::filter(if(n() > 1) {(Score < SD) } else TRUE) %>%
dplyr::filter(dplyr::between(Score, Average, SD)) %>%
slice_max(count1, n = 1) %>%
slice_max(count2, n = 1) %>%
subset(!(group %in% SD1$group)) %>%
ungroup()
Error: Problem with `filter()` input `..1`.
x `left` must be length 1
i Input `..1` is `dplyr::between(Score, Average, SD)`.
i The error occurred in group 1: group = 1.
I've seen similar questions but trying to apply their answers hasn't worked in my use-case.
Is there a coding reason why this has appeared or is it a problem dplyr is having with a new conflicting package I've installed in RStudio? It's the only thing I've done differently that I know of.
The data it is filtering looks like:
group gene Score Average SD count1 count2
1 gene1 0.1 0.43 0.75 0 1
1 gene2 0.5 0.43 0.75 0 23
1 gene3 0.7 0.43 0.75 1 45
2 gene4 0.88 0.7 0.75
Session info:
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.5.1 stringr_1.4.0 purrr_0.3.4 readr_1.4.0 tibble_3.0.6
[6] ggplot2_3.3.3 tidyverse_1.3.0 tidyr_1.1.2 dplyr_1.0.3 data.table_1.13.6
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 pillar_1.4.7 compiler_4.0.2 cellranger_1.1.0 dbplyr_2.0.0
[6] tools_4.0.2 jsonlite_1.7.2 lubridate_1.7.9.2 lifecycle_0.2.0 gtable_0.3.0
[11] pkgconfig_2.0.3 rlang_0.4.10 reprex_1.0.0 cli_2.3.0 rstudioapi_0.13
[16] DBI_1.1.1 haven_2.3.1 withr_2.4.1 xml2_1.3.2 httr_1.4.2
[21] fs_1.5.0 generics_0.1.0 vctrs_0.3.6 hms_1.0.0 grid_4.0.2
[26] tidyselect_1.1.0 glue_1.4.2 R6_2.5.0 readxl_1.3.1 modelr_0.1.8
[31] magrittr_2.0.1 backports_1.2.1 scales_1.1.1 ellipsis_0.3.1 rvest_0.3.6
[36] assertthat_0.2.1 colorspace_2.0-0 stringi_1.5.3 munsell_0.5.0 broom_0.7.4
[41] crayon_1.4.0
Upvotes: 2
Views: 4573
Reputation: 887511
The error occurred in the last step of subset
(in tidyverse
, it would be with filter
) though as "UpperSD_Genes" dataset shows 0 rows and thus it is not able to extract values from that column
nrow(UpperSD_Genes)
#[1] 0
i.e. it is working fine until
df %>%
group_by(group) %>%
dplyr::filter(if(n() > 1) {(RFR_Score < Upper_SD_Threshold) } else TRUE) %>%
dplyr::filter(dplyr::between(RFR_Score, AvgScore_Per_Group, Upper_SD_Threshold)) %>%
slice_max(direct_PPI_count, n = 1) %>%
slice_max(secondary_PPI_count, n = 1)
# A tibble: 1 x 7
# Groups: group [1]
# group gene RFR_Score AvgScore_Per_Group Upper_SD_Threshold direct_PPI_count secondary_PPI_count
# <int> <chr> <dbl> <dbl> <dbl> <int> <int>
#1 1 gene3 0.7 0.43 0.75 1 45
Also, the specific error in OP's code is related to left
and right
of between
which takes a value of length
1. If there are more than one element, it throws the length
error. To circumvent, we can use first
df %>%
group_by(group) %>%
dplyr::filter(if(n() > 1) {(RFR_Score < Upper_SD_Threshold) } else TRUE) %>%
dplyr::filter(dplyr::between(RFR_Score, first(AvgScore_Per_Group), first(Upper_SD_Threshold)) )
Upvotes: 2