Gabe Spradlin
Gabe Spradlin

Reputation: 2057

R: Extracting Data Frame from list of CSV file data

So I come from a background of Matlab and Python (and several others less related). I'm picking up R for a Coursera course.

I followed this SO answer in order to read in all my homework files into a list in a single line of code. My code looks like this:

# Get a list of files
files = list.files(path = dataDir, pattern = '*.csv')

# Import the file data
setwd(dataDir)
data = lapply(files, read.csv)

This all works just fine. However, I am getting a object back that I don't know how to access. I mentioned Matlab and Python before because I've attempted to access the data in all the ways I would in those languages.

Here's what summary output:

summary(data)
       Length Class      Mode
  [1,] 4      data.frame list
  [2,] 4      data.frame list
  [3,] 4      data.frame list

There are actually 352 of them not just 3 but no one needs a listing of all 352. Here's what summary of an individual index outputs:

summary(data[200])
     Length Class      Mode
[1,] 4      data.frame list

So if I enter data[200] I get listing of the first 2500 rows of data. But data[200, 100] returns as error as does data[200][,100] and data[200][100,]. data[200][100] returns [[1]] NULL.

While I haven't fully considered what I will need to do for this homework I'm sure it will involve calculating means/medians/maximum/etc of all non-NA values in various data columns. This wasn't tough to do for the quizzes using something like mean(data[which(is.na('Col1')==F), 'Col6']).

So I imagine I could use a more hackish version of what I need where I simply load the 1 file I need at the time I need it, extract only the portion of the data frame I need right then, and loop over all the data files I need to process. However, I'd rather know how to access the data in the object R creates from the lapply line. I suspect this will make more complex analyses later on much easier to code.

Thanks

Upvotes: 0

Views: 1930

Answers (1)

C_Z_
C_Z_

Reputation: 7796

When you subset, single square brackets [ always return an object of the same class as the object you are subsetting. So, data[200] returns a list of length 1 containing one dataframe because data is a list. Double square brackets [[ give you the actual object contained in the list (in this case, a dataframe). Once you have a dataframe, you can select the first 100 rows with [100,], which is why the following works:

data[[200]][100,]

Upvotes: 3

Related Questions