Shweta Chopra
Shweta Chopra

Reputation: 1

Mismatches while attempting to rename PDFs using an excel for reference in R

I need to rename 15000 PDFs by referencing a 15000 row excel sheet that contains both the Old Names and required New Names in two corresponding columns.

My excel sheet looks like this: Reference Excel

I am keeping my files in a folder called - SDP Response. Then running the following script:

#Use Readxl and dplyr package
library(readxl)
library(dplyr)

#Set Working Directory as Desktop
setwd("/Users/shwetachopra/Desktop")

#Set path to folder that contains our PDFs as Folder 
folder <- "/Users/shwetachopra/Desktop/SDP Responses"

#Create vector of list of PDF names, from folder
oldnames <- as.vector(list.files(folder))

#Export excel of matching newnames in same order as above
sdpdata <- read_excel("NewSDPNames.xlsx")

#Change working directory to folder containing the pdfs
setwd("/Users/shwetachopra/Desktop/SDP Responses")

#Separately extract the column which contain the matching old names
oldies <- sdpdata[[2]]

#filter rows that match old file names
subset <- sdpdata[grep(paste(oldnames, collapse="|"),oldies),]

#extract new file names
new_names <- subset[[3]]

#rename old file names with new ones
file.rename(oldnames,new_names)

The script is working when I run it on 100 PDFs, renaming them perfectly. However, as soon as I add even one more file - at 101, it renames them inaccurately, matching the old names to incorrect New Names. There is no limit I have set within my script and am unable to understand why it is creating inaccuracies beyond a 100 PDFs.

I'm new to R and had to figure much of this out through google. I would really appreciate some help to ensure that I can rename all 15000 PDFs together.

Upvotes: 0

Views: 35

Answers (1)

R. Schifini
R. Schifini

Reputation: 9313

Try this:

library(readxl)

#Set Working Directory as Desktop
setwd("/Users/shwetachopra/Desktop/SDP Responses")

#Read the list of PDFs
oldnames = list.files(".", pattern = ".pdf$")

# Read the excel with 'Old PDF Names' and 'New Names' columns
sdpdata <- read_excel("NewSDPNames.xlsx")

# Only select from sdpdata those files that are in oldnames
subset = sdpdata $`Old PDF Names` %in% oldnames

# Rename
file.rename(from = sdpdata $`Old PDF Names`[subset],
            to = sdpdata $`New Names`[subset])

I think that your problem may be due to the last line

file.rename(oldnames,new_names)

Compare the lengths of oldnames to new_names. If they do not match, then you might not have an entry in the excel for one or more files in the folder.

Upvotes: 1

Related Questions