Andrea M
Andrea M

Reputation: 2481

Subset error after importing with readr ("arguments imply differing number of rows") in Quarto R chunk

I can't subset rows after importing a csv with readr::read_csv from a .qmd.

Data example

Create a file called BSI_test8.csv with these two rows:

id,data_collection_date,data_collection,reporting_organisation_code,specimen_date,specimen_time,week_no,icu_admission_date,icu_admission_time
478574,01/01/1900,ICU BSI,ABC1,01/01/1900,12:00,1,31/12/1899,23:00

Code

  1. Create a .qmd file
  2. Insert a new R code chunk
  3. Paste the following code in that chunk:
linelist_pbc_raw <-
    readr::read_csv("./BSI_test8.csv", show_col_types = FALSE)

linelist_pbc_raw[linelist_pbc_raw$specimen_date != linelist_pbc_raw$data_collection_date, ]
  1. Run the chunk.

Error

I get this error:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 0, 1

I don't understand what this error means, as both columns have the same number of rows (in this example, 1).

What works

I have found these workarounds:

  1. Running the exact same subsetting code directly from the console

  2. From an R chunk, removing some other date columns from the dataset before subsetting (in this MRE, just the time columns):

linelist_pbc_raw |>
    subset(specimen_date != data_collection_date,
           select = -c(specimen_time, icu_admission_time))
  1. From an R chunk, using data.table::fread to import the data before subsetting
linelist_pbc_raw_datatable <-
    data.table::fread("./BSI_test8.csv")

linelist_pbc_raw_datatable[linelist_pbc_raw_datatable$specimen_date != linelist_pbc_raw_datatable$data_collection_date, ]

I imagine this might be a bug in the way that readr imports data without a final empty line and the way RStudio runs the R chunks within a qmd file. Any idea why this might happen?

R version 4.4.1
RStudio version 2024.9.0.375
OS: Windows 10 x64 (build 19045)

Upvotes: 0

Views: 65

Answers (0)

Related Questions