nchand
nchand

Reputation: 17

Read selected rows from multiple files in a folder in R

I have around 170 parquet files in a folder and am trying to select rows that match a value in a column. but i get this error after a few runs. this code works if I manually put a few files but not for all. can someone please help?

Error: Problem with `filter()` input `..1`.
i Input `..1` is `pq$service_id %in% serviceroutes`.
x Input `..1` must be of size 1, not size 0.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
Unknown or uninitialised column: `service_id`.

Code:

MTWs<-list.files(path=filepath)
serviceroutes<-unique(servicesumGWY$service_id)
outData1 <-data.table()
for (file in MTWs) {
 fp<-paste0(filepath,file)
 pq<- read_parquet(fp)
 dataT <- pq %>% filter(pq$service_id %in% serviceroutes)
 outData1<- rbind(outData1,dataT,fill=TRUE)
}

Upvotes: 0

Views: 39

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389012

Filter the rows only if service_id column is present in the data.

library(dplyr)
library(purrr)

MTWs <- list.files(path=filepath, full.names = TRUE)
serviceroutes <- unique(servicesumGWY$service_id)

outData <- map_df(MTWs, ~{
  tmp <- read_parquet(.x)
  if('service_id' %in% colnames(tmp)) 
      tmp %>% filter(service_id %in% serviceroutes)
  })

Upvotes: 1

Related Questions