Reputation: 1396
I'm having difficulty figuring out how to subset some specific data from dataframes stored in a list. I've read numerous articles on this site as well as UCLA and Adv-R and I'm just not making any progress.
Advanced-R for Subsetting UCLA Advanced R for Subsetting
My function reads in arguments that help it identify what data I'm interested in pulling out across a range of files. So, dat1, dat2 and dat3 in files 1:15 stored in a directory of files (1:999).
Using an lapply and read.CSV I have read all of my files (1:15) into a list of dataframes.
x <- lapply(directory[id], function(i) {
read.csv(i, header = TRUE)
} )
An example looks like this via str(x) [of just the first element]:
List of 15
$ :'data.frame': 1461 obs. of 4 variables:
..$ DateObv : Factor w/ 1461 levels "2003-01-01","2003-01-02",..: 1 2 3 4 5 6 7 8 9 10 ...
..$ dat1: num [1:1461] NA NA NA NA NA NA NA NA NA NA ...
..$ dat2: num [1:1461] NA NA NA NA NA NA NA NA NA NA ...
..$ ID : int [1:1461] 1 1 1 1 1 1 1 1 1 1 ...
So in the argument to my function I want to tell it give me dat1 from files 1:15 and then I'll do a mean of the results.
I thought maybe I could use another lapply to subset dat1 specifically into a vector but it keeps returning a NULL value, or "list()" or just errors that set object cannot be subset, or subset missing argument. I've tried subset, bracket notation.
How do you recommend that I take a subset of the list of dataframes so that I get back all dat1's or dat2's into a single vector that I can run a mean against?
Thank you for your time and consideration.
Upvotes: 0
Views: 178
Reputation: 4121
create a similar data set:
> x = list(data.frame(dat1 = 1:3,dat2=10), data.frame(dat1 = 2:4,dat2=10))
> str(x)
List of 2
$ :'data.frame': 3 obs. of 2 variables:
..$ dat1: int [1:3] 1 2 3
..$ dat2: num [1:3] 10 10 10
$ :'data.frame': 3 obs. of 2 variables:
..$ dat1: int [1:3] 2 3 4
..$ dat2: num [1:3] 10 10 10
use lapply to select variable dat1
:
> lapply(x, function(X) X$dat1)
[[1]]
[1] 1 2 3
[[2]]
[1] 2 3 4
bind the resulting list to a vector with c
, call mean
on the resulting vector, and add na.rm=TRUE
to remove the NA
values:
> mean(do.call(c, lapply(x, function(X) X$dat1)),na.rm=TRUE)
[1] 2.5
Upvotes: 0
Reputation: 1928
I love plyr for this sort of thing. I would do something like this if you want the mean for each data.frame:
library(plyr)
ldply(x, summarize, Mean = mean(dat1))
or, if you want a long vector of all the dat1 columns and you want to take the mean of all of them, I'd still use plyr but do this:
x <- rbind.fill(x)
mean(x$dat1)
Upvotes: 1