Reputation: 1
I need to rename 15000 PDFs by referencing a 15000 row excel sheet that contains both the Old Names and required New Names in two corresponding columns.
My excel sheet looks like this: Reference Excel
I am keeping my files in a folder called - SDP Response. Then running the following script:
#Use Readxl and dplyr package
library(readxl)
library(dplyr)
#Set Working Directory as Desktop
setwd("/Users/shwetachopra/Desktop")
#Set path to folder that contains our PDFs as Folder
folder <- "/Users/shwetachopra/Desktop/SDP Responses"
#Create vector of list of PDF names, from folder
oldnames <- as.vector(list.files(folder))
#Export excel of matching newnames in same order as above
sdpdata <- read_excel("NewSDPNames.xlsx")
#Change working directory to folder containing the pdfs
setwd("/Users/shwetachopra/Desktop/SDP Responses")
#Separately extract the column which contain the matching old names
oldies <- sdpdata[[2]]
#filter rows that match old file names
subset <- sdpdata[grep(paste(oldnames, collapse="|"),oldies),]
#extract new file names
new_names <- subset[[3]]
#rename old file names with new ones
file.rename(oldnames,new_names)
The script is working when I run it on 100 PDFs, renaming them perfectly. However, as soon as I add even one more file - at 101, it renames them inaccurately, matching the old names to incorrect New Names. There is no limit I have set within my script and am unable to understand why it is creating inaccuracies beyond a 100 PDFs.
I'm new to R and had to figure much of this out through google. I would really appreciate some help to ensure that I can rename all 15000 PDFs together.
Upvotes: 0
Views: 35
Reputation: 9313
Try this:
library(readxl)
#Set Working Directory as Desktop
setwd("/Users/shwetachopra/Desktop/SDP Responses")
#Read the list of PDFs
oldnames = list.files(".", pattern = ".pdf$")
# Read the excel with 'Old PDF Names' and 'New Names' columns
sdpdata <- read_excel("NewSDPNames.xlsx")
# Only select from sdpdata those files that are in oldnames
subset = sdpdata $`Old PDF Names` %in% oldnames
# Rename
file.rename(from = sdpdata $`Old PDF Names`[subset],
to = sdpdata $`New Names`[subset])
I think that your problem may be due to the last line
file.rename(oldnames,new_names)
Compare the lengths of oldnames
to new_names
. If they do not match, then you might not have an entry in the excel for one or more files in the folder.
Upvotes: 1