Reputation: 196
I have a bibliographic directory/folder (/Biblio) with 66 subdirectories/folders (/01 folder, /02 folder, … /66 folder) that contain a different number of files with different extensions (e.g. pdf, txt, csv, …), and subfolders with files with similar extensions, but I am not interested on the information of the these sub-subfolders. Some subfolders do not have any “pdf” file. I want to count the number of “pdf” files in each subfolder.
I can list the pdf files in all subfolders of “/Biblio” with:
BiblioPath = "C:/Biblio"
BiblioDir = list.dirs(path = BiblioPath, full.names = TRUE, recursive = FALSE)
BiblioFiles = list.files(path = BiblioDir, pattern = "pdf", recursive = FALSE, full.names = TRUE)
(Note: the string “pdf” does never occur in my filenames). “BiblioFiles” is the full list of the pdf files, but I do not know how to count how many “pdf” files are in each subdirectory without a loop.
Upvotes: 4
Views: 9664
Reputation: 196
I thank @Richard Border and @alistaire for their prompt, similar, simple and elegant answers. As they have been posted as comments, I have decided to copy as answer the one that I like more:
sapply(BiblioDir,function(dir){length(list.files(dir,pattern='pdf'))})
It works perfectly and I like the absence of explicit loops.
Upvotes: 3
Reputation: 78832
tidyverse:
library(tidyverse)
fils <- list.files("~/Development", pattern="pdf$", full.names = TRUE, recursive = TRUE)
data_frame(
dir = dirname(fils)
) %>%
count(dir) %>%
mutate(dir = map_chr(dir, digest::digest)) # you don't need to see my dir names so just remove this from your work
## # A tibble: 14 x 2
## dir n
## <chr> <int>
## 1 06e6c4fed6e941d00c04cae3bd24888b 18
## 2 98bf27d6686a52772cb642a136473d86 9
## 3 c07bfc45ce148933269d7913e1c5e833 1
## 4 84088c9c18b0eb10478f17870886b481 1
## 5 baeb85661aad8bff2f2b52cb55f14ede 1
## 6 c484306deae0a70b46854ede3e6b317a 22
## 7 70750a506855c6c6e09f8bdff32550f8 4
## 8 8c5cbe2598f1f24f1549aaafd77b14c9 1
## 9 9008083601c1a75def1d1418d8acf39e 1
## 10 0c25ef8d27250f211d56eff8641f8beb 1
## 11 3e30987a34a74cb6846abc51e48e7f9e 1
## 12 e71c330b185bf4974d26d5379793671b 1
## 13 fe2e8912e58ba889cf7c6c3ec565b2ee 4
## 14 e07698c59f5c11ac61e927e91c2e8493 27
base:
fils <- list.files("~/Development", pattern="pdf$", full.names = TRUE, recursive = TRUE)
dirs <- dirname(fils)
dirs <- sapply(dirs,digest::digest) # you don't need to see my dir names so just remove this from your work
as.data.frame(table(dirs))
## dirs Freq
## 1 06e6c4fed6e941d00c04cae3bd24888b 18
## 2 0c25ef8d27250f211d56eff8641f8beb 1
## 3 3e30987a34a74cb6846abc51e48e7f9e 1
## 4 70750a506855c6c6e09f8bdff32550f8 4
## 5 84088c9c18b0eb10478f17870886b481 1
## 6 8c5cbe2598f1f24f1549aaafd77b14c9 1
## 7 9008083601c1a75def1d1418d8acf39e 1
## 8 98bf27d6686a52772cb642a136473d86 9
## 9 baeb85661aad8bff2f2b52cb55f14ede 1
## 10 c07bfc45ce148933269d7913e1c5e833 1
## 11 c484306deae0a70b46854ede3e6b317a 22
## 12 e07698c59f5c11ac61e927e91c2e8493 27
## 13 e71c330b185bf4974d26d5379793671b 1
## 14 fe2e8912e58ba889cf7c6c3ec565b2ee 4
Upvotes: 6
Reputation: 7659
Since you want to count the number of PDF files only, you don't need the file names here, so the third line of your attempted code is unnecessary.
Start with the first two lines
BiblioPath = "C:/Biblio"
BiblioDir = list.dirs(path = BiblioPath, full.names = TRUE, recursive = FALSE)
and then create a dataframe that takes the names of the folders and the PDF counts, such as
x <- data.frame( Dir = BiblioDir, no = 0 )
and update the column with the number of files, calculated via
for( i in seq( length( BiblioDir ) ) ) x$no[ i ] <-
length( list.files(path = BiblioDir[ i ], pattern = "pdf", recursive = FALSE, full.names = TRUE) )
That will give a you a data.frame x
with the folder names and the PDF files per folder.
This is a loop, not sure whether "without a loop" in your question was a condition; but I don't see any reason not using a loop here.
Upvotes: 1