C Brown
C Brown

Reputation: 23

R: Classes of specific columns in list of dataframes

I have a set of excel files each containing one sheet of data, all of similar structure (mostly -- see below), that I want to ultimately combine into one large data frame (with each sub-set indexed by original file source). I am able to create a list of multiple dataframes, and then merge these into one dataframe, pretty easily with the following code:

files <- grep(".xlsx", dir(), value=TRUE) # vector of file names
IDnos <- substr(files,20,24) #vector with key 5-digit ID info of each file

library("XLConnect")
library("data.table")

datalist <- lapply(files, readWorksheetFromFile, sheet = "Data")
names(datalist) <- IDnos
bigdatatable <- rbindlist(datalist, idcol = "IDNo")

One data column "Value" is usually class numeric, except I found that in several there was an "ND" put in to one row, making it class character, so in the final data frame the column is character.

Although I can fix this with some simple cleaning, I was left wondering if there is way to identify at the "list of dataframes" stage which files (or dataframe components of the list I created) with class character for column "Value". For example I can't run sapply(datalist,class) or other variations. I am hoping to avoid a for-loop.

Is there any way to use lapply or sapply to drill down into dataframes within a list?

Upvotes: 1

Views: 543

Answers (1)

neilfws
neilfws

Reputation: 33782

Here's how I would use lapply to find the class of column a in a list of 2 data frames, named x and y.

datalist <- list(x = data.frame(a = letters),
                 y = data.frame(a = 1:26))
lapply(datalist, function(x) class(x$a))

$x
[1] "factor"

$y
[1] "integer"

Upvotes: 1

Related Questions