Reputation: 249
So i have an excel file which has data in several sheets that i have to consolidate so i can provide insights from it:
These sheets are named after each month starting from November .....October (in total: 12 sheets)
My code starts out like this:
#List of months to look at
months = c("Novemeber", "December", "January", "February", "March", "April", "May", "June", "July", "August", "September")
What i want to do is match column names from each of these sheets with an empty df (i call it discrepancies) and fetch data to those columns accordingly. My code is like this
discrepancies <-
setNames(
data.frame(matrix(ncol = 12, nrow = 0)),
c(
"Date",
"Officer",
"Case Number",
"Account Number",
"Plan Type",
"Type",
"ID",
"Transaction Amount",
"Code",
"Specialist",
"Transit#",
"Processed Via"
)
)
#Query for each month's data and append to the main dataframe
for (i in months) {
temp <- read_excel(
"G:/Confidental.xlsx",
sheet = i,
col_names = TRUE,
skip = 0
)
temp$`months` <- i
discrepancies <- rbind(discrepancies, temp)
}
This code is taking every field in the sheet compared to just the columns i want and it gets stuck when one sheet has different number of columns than the one in discrepancies df. Any help is appreciated.
Upvotes: 0
Views: 1197
Reputation: 389175
I don't think you need to create an empty dataframe to compare all the columns. Try this approach :
library(readxl)
result <- purrr::map_df(months, ~read_excel("G:/Confidental.xlsx",sheet = .x),
.id = 'months')
This would combine in all the sheets on one dataframe. If some of the column are absent in the sheet this will automatically insert NA
for those columns in those month.
Upvotes: 1
Reputation: 4344
a possible solution would be along the lines of this example:
# verification data.frame
descrepancies <- data.frame(Col1=character(),
Col2=character(),
Col3=character())
# test 1: one column missing
df1 <- data.frame(Col1= c(1,1),
Col3= c(1,1))
# test 2: one column that is not in discrepancies
df2 <- data.frame(Col1= c(2,2),
Col4= c(2,2))
# text 3: all columns are matching
df3 <- data.frame(Col1= c(3,3),
Col2= c(3,3),
Col3= c(3,3))
The steps I used are getting the column names from the test data.frame, create new columns for those that are not in the test data.frame but in discrepancies, select all columns from test data.frame that are in discrepancies. I just ran it 3 times to check all cases and mount one final df to proof it is working
# get column names from descrepancies to check the tests
nd <- colnames(descrepancies)
# run procedure on test 1
nf1 <- colnames(df1)
df1[, nd[!nd %in% nf1]] <- NA
descrepancies <- rbind(descrepancies, df1[, nd])
# run procedure on test 2
nf2 <- colnames(df2)
df2[, nd[!nd %in% nf2]] <- NA
descrepancies <- rbind(descrepancies, df2[, nd])
# run procedure on test 3
nf3 <- colnames(df3)
df3[, nd[!nd %in% nf3]] <- NA
descrepancies <- rbind(descrepancies, df3[, nd])
# print the final df
descrepancies
Col1 Col2 Col3
1 1 NA 1
2 1 NA 1
3 2 NA NA
4 2 NA NA
5 3 3 3
6 3 3 3
Upvotes: 1