Reputation: 43
I want to create a vector with rownames of certain rows of my dataframe, but I keep failing and I feel there is something obvious I am missing. My dataframe is extremely large but I have created an example that gives me the exact same problem.
resmakeup <- data.frame("example" = c(4, -3, 2, 1),
row.names = c("number1", "number2", "number3", "number4")
)
selection <- rownames(resmakeup[abs(resmakeup$example) >= 2,])
So, if my table looks like this:
example
number1 4
number2 -3
number3 2
number4 1
I want the "selection" vector to contain number1, number2 and number 3, but that is not working. Instead, I get an empty vector. I checked whether the dataframe had rownames with has_rownames() and that was true. In addition, I checked whether my selection resmakeup[abs(resmakeup$example) >= 2,] works, and it does.
What am I doing wrong and how do I fix it?
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 LC_MONETARY=Dutch_Netherlands.1252
[4] LC_NUMERIC=C LC_TIME=Dutch_Netherlands.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] writexl_1.3.1 forcats_0.5.0 stringr_1.4.0 purrr_0.3.4
[5] readr_1.4.0 tidyr_1.1.2 tibble_3.0.4 tidyverse_1.3.0
[9] RColorBrewer_1.1-2 readxl_1.3.1 pheatmap_1.0.12 ggthemes_4.2.4
[13] ggrepel_0.9.1 ggplot2_3.3.3 GEOquery_2.58.0 edgeR_3.32.1
[17] limma_3.46.0 dplyr_1.0.2 DESeq2_1.30.0 SummarizedExperiment_1.20.0
[21] Biobase_2.50.0 MatrixGenerics_1.2.0 matrixStats_0.57.0 GenomicRanges_1.42.0
[25] GenomeInfoDb_1.26.2 IRanges_2.24.1 S4Vectors_0.28.1 BiocGenerics_0.36.0
[29] ashr_2.2-47
loaded via a namespace (and not attached):
[1] fs_1.5.0 bitops_1.0-6 lubridate_1.7.9.2 bit64_4.0.5 httr_1.4.2
[6] tools_4.0.2 backports_1.2.1 R6_2.5.0 irlba_2.3.3 DBI_1.1.1
[11] colorspace_2.0-0 withr_2.4.0 tidyselect_1.1.0 bit_4.0.4 compiler_4.0.2
[16] cli_2.2.0 rvest_0.3.6 xml2_1.3.2 DelayedArray_0.16.0 labeling_0.4.2
[21] scales_1.1.1 SQUAREM_2021.1 genefilter_1.72.0 mixsqp_0.3-43 digest_0.6.27
[26] XVector_0.30.0 pkgconfig_2.0.3 dbplyr_2.0.0 invgamma_1.1 rlang_0.4.10
[31] rstudioapi_0.13 RSQLite_2.2.1 farver_2.0.3 generics_0.1.0 jsonlite_1.7.2
[36] BiocParallel_1.24.1 RCurl_1.98-1.2 magrittr_2.0.1 GenomeInfoDbData_1.2.4 Matrix_1.2-18
[41] fansi_0.4.2 Rcpp_1.0.5 munsell_0.5.0 lifecycle_0.2.0 stringi_1.5.3
[46] zlibbioc_1.36.0 grid_4.0.2 blob_1.2.1 crayon_1.3.4 lattice_0.20-41
[51] haven_2.3.1 splines_4.0.2 annotate_1.68.0 hms_1.0.0 locfit_1.5-9.4
[56] pillar_1.4.7 geneplotter_1.68.0 reprex_0.3.0 XML_3.99-0.5 glue_1.4.2
[61] modelr_0.1.8 vctrs_0.3.6 cellranger_1.1.0 gtable_0.3.0 assertthat_0.2.1
[66] xfun_0.20 xtable_1.8-4 broom_0.7.3 survival_3.1-12 truncnorm_1.0-8
[71] tinytex_0.29 AnnotationDbi_1.52.0 memoise_1.1.0 ellipsis_0.3.1
Upvotes: 0
Views: 2665
Reputation: 160407
When you run into problems, start executing expressions from the outside inwards to find where things start going wrong.
rownames(resmakeup[abs(resmakeup$example) >= 2,])
# NULL
resmakeup[abs(resmakeup$example) >= 2,]
# [1] 4 -3 2
Okay, you cannot get row names from an integer
vector.
The culprit here is R's default behavior to drop the dimensions of a data.frame
when you select down to one column or one row. (FYI, both dplyr
and data.table
choose to not follow that frustrating behavior.) You can get around that with drop=FALSE
.
resmakeup[abs(resmakeup$example) >= 2,, drop = FALSE]
# example
# number1 4
# number2 -3
# number3 2
and therefore
rownames(resmakeup[abs(resmakeup$example) >= 2,, drop = FALSE])
# [1] "number1" "number2" "number3"
I'll take this opportunity to soap-box a base R function that also makes this easier, both to read and that it does not exhibit the drop=
"feature": subset
.
rownames(subset(resmakeup, abs(example) >= 2))
# [1] "number1" "number2" "number3"
Its use of non-standard evaluation (i.e., ability to use column names without the df$
leader, as in example
) makes reading it a bit simpler, and it never drops.
Upvotes: 2
Reputation: 10761
This is an issue with subsetting a data.frame (see this help file for more information). You need to specify drop = FALSE
in your data:
rownames(resmakeup[abs(resmakeup$example) >= 2,,drop = FALSE])
# [1] "number1" "number2" "number3"
If you inspect what running resmakeup[abs(resmakeup$example) >= 2,]
returns, you'll notice that it's returning a vector and not a data.frame (coercing to lowest possible dimension). Using drop = FALSE
will preserve the data.frame type after subsetting.
Upvotes: 1