Prradep
Prradep

Reputation: 5696

Extracting identifiers without matching files in a folder

How to extract the identifiers which do not have corresponding files being generated?

Identifiers which are given as input for generation fo files:

fileIden <- c('a-1','a-2','a-3','b-1','b-2','c-1','d-1','d-2','d-3','d-4')

Checking the files generated:

files <- list.files(".")

files
# [1] "a-2.csv" "a-3.csv" "b-1.csv" "c-1.csv" "d-3.csv"

# Generated here for reproducibility.
# files <- c("a-2.csv", "a-3.csv", "b-1.csv", "c-1.csv", "d-3.csv")

Expected files if all the process is completely successful

fileExp <- paste(fileIden, ".csv", sep = "")
# [1] "a-1.csv" "a-2.csv" "a-3.csv" "b-1.csv" "b-2.csv" "c-1.csv" "d-1.csv" "d-2.csv" "d-3.csv" "d-4.csv"

Any expected files are missing?

fileMiss <- fileExp[!fileExp %in% files]
# [1] "a-1.csv" "b-2.csv" "d-1.csv" "d-2.csv" "d-4.csv"

Expected output

# "a-1" "b-2" "d-1" "d-2" "d-4"

I am sure that there is an easy process directly to get the above output without creating the files: fileExp, fileMiss. Could you please guide me there?

Upvotes: 0

Views: 52

Answers (2)

e.matt
e.matt

Reputation: 886

a less elegant approach

result <- ifelse(fileIden %in% substr(file, 1, 3), "", fileIden)
result[result != ""]

Upvotes: 0

PKumar
PKumar

Reputation: 11128

You can do this :

fileIden <- c('a-1','a-2','a-3','b-1','b-2','c-1','d-1','d-2','d-3','d-4')
file <- c("a-2.csv", "a-3.csv" ,"b-1.csv", "c-1.csv", "d-3.csv")


setdiff(fileIden, trimws(gsub("\\.csv","", file)))

Another approach:

setdiff(fileIden, stringr::str_extract(file,"(.*)(?=\\.csv)"))

Logic:

setdiff finds the difference between two vectors, gsub replaces the ".csv" with nothing , we club them together to find the difference between those vectors.

Output:

#[1] "a-1" "b-2" "d-1" "d-2" "d-4"

Upvotes: 1

Related Questions