ML33M
ML33M

Reputation: 415

How to read in excel sheets into one data frame in R and skip certain lines

I'm trying to read in an excel file with multiple sheets using R and merge them all into one data frame , tag the sheet name to one column of the dataframe.

Then this time I got a problem that the excel sheets contain 1 extra row for unnecessary titles so I want to skip row 1.

Using read_excel in lapply, I naturally think of just adding skip=1 such as

mylist <-lapply(excel_sheets(path), read_excel(skip=1)

Then R complained about path, and if I keep adding path, it complains that read_excel is not a function. So I thought okay write that using function(x){}

And this totally screwed up. The list generated had a subtle error that I only figured out when I'm plotting the data : it copy and pasted the same sheet 1 multiple times and added the correct sheet names on duplicated data.

Sure I can manually deleting the 1st row, but I want to know where I made the mistake and how to fix it.

library(readxl)

#read in excel sheets
#but now I need to skip one line
path <- "/Users/xxx/file.xlsx"
sheetnames <- excel_sheets(path)
mylist <- lapply(excel_sheets(path), function(x){read_excel(path= path,col_names = TRUE,skip = 1)})

# name the dataframes
names(mylist) <- sheetnames

#use Map to bind all the elements of the list into a dataframe
my_list <- Map(cbind, mylist, Cluster = names(mylist))
df <- do.call("rbind", my_list)

Upvotes: 1

Views: 3605

Answers (2)

jruizri
jruizri

Reputation: 34

try datapasta package. It paste what you select.

Click the example here

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389275

In read_excel function you are not passing the sheet that you want to read which is present in sheetnames variable. Try the following :

library(readxl)
path <- "/Users/xxx/file.xlsx"
sheetnames <- excel_sheets(path)
mylist <- lapply(sheetnames, function(x) 
                 read_excel(path,x, col_names = TRUE,skip = 1))
#col_names is TRUE by default so you can use this without anonymous function like
#mylist <- lapply(sheetnames, read_excel, path = path, skip = 1)

# name the dataframes
names(mylist) <- sheetnames

#use Map to bind all the elements of the list into a dataframe
my_list <- Map(cbind, mylist, Cluster = names(mylist))
df <- do.call("rbind", my_list)

Upvotes: 4

Related Questions