Reputation: 918
I have numerous data frames for which I would like to apply the same function.
Context: I have data frames which record time windows for subjects, with an indicator which is 0/1, saying if an even occured in that time window. An example:
ID start stop event
1 0 12 0
1 12 24 0
1 24 36 1
1 36 48 1
2 0 12 0
etc. What I have is a function which deletes every entry after the first event for each id, for example, the record for ID = 1, start time = 36 in the above example.
The code for one dataset is: (the dataset is called event1 , IDT is ID)
list1 <- which(event1$event == 1)
while(length(list1) >= 1){
id <- event1[ list1[ 1 ] , ]$IDT
idplus1 <- event1[ ( list1[1] + 1) , ]$IDT
b <- which( event1$IDT == id )
if( id == idplus1 ){
event1 <- event1[- ( ( list1[1] + 1 ) : b[ length(b) ] ) , ]
}
list1 <- list1[-1]
}
Now if I have four datasets; event1, event2, event3, event4, and I want to apply this function to each of those, is there a way to define a function to do this? I feel like there should be an opportunity to use lapply here...
Upvotes: 0
Views: 282
Reputation: 58825
I'm going to start by cleaning up your example code because I could not get it to run without error on your example. Whenever you have something that you are doing for each value of some variable ("for each ID", in this case), you are looking at a split-apply-combine problem. My preferred tool for such problems is the plyr
package, but it is not the only one. I would re-write your procedure as:
library("plyr")
ddply(event1, .(ID), function(DF) {
firstevent <- which(DF$event == 1)[1]
DF[seq(length=firstevent),]
})
This takes event1
, splits it up by unique values of ID
, and, for each of those, keeps only the records up to the first event.
This can be wrapped in a function easily.
truncevent <- function(event1) {
ddply(event1, .(ID), function(DF) {
firstevent <- which(DF$event == 1)[1]
DF[seq(length=firstevent),]
})
}
With an expanded event1
:
event1 <- read.table(text=
"ID start stop event
1 0 12 0
1 12 24 0
1 24 36 1
1 36 48 1
2 0 12 0
2 12 24 1
2 24 36 1", header=TRUE)
we get
> truncevent(event1)
ID start stop event
1 1 0 12 0
2 1 12 24 0
3 1 24 36 1
4 2 0 12 0
5 2 12 24 1
Now we get to the part of your question about iterating over multiple data sets. One approach is to create a vector of data set names, iterate over that, and run the function on each of these.
events <- c("event1", "event2", "event3", "event4")
lapply(events, function(event) {
truncevent(get(event))
})
More natural is to put the events into a list themselves rather than having to keep track of each of the names separately. Then iterating over this list is even simpler.
events <- list(event1, event2, event3, event4)
lapply(events, truncevent)
Both of these approaches will give you back a list of data.frames which are the transformed versions.
Upvotes: 0
Reputation: 109864
Here's how I would approach your problem:
Creating a data set (list of dataframes)
dat1 <- read.table(text="ID start stop event
1 0 12 0
1 12 24 0
1 24 36 1
1 36 48 1
2 12 24 0
2 24 36 1
2 36 48 1
3 0 12 0", header=TRUE)
dat2 <- dat3 <- dat1
dats <- list(dat1, dat2, dat3)
Applying a function to a list of dataframes
#Function to select up to first 1
FUN <- function(x) {
splitx <- split(x, x$ID)
out <- do.call(rbind, lapply(splitx, function(x) {
inds <- c(which(x$event == 0), which(x$event == 1)[1])
na.omit(x[inds, ])
}))
data.frame(out, row.names=NULL)
}
#apply it to all in list
lapply(dats, FUN)
Upvotes: 1
Reputation: 263342
Untested:
evnt.fn <- function(evnt.df)
list1 <- which(evnt.df$event == 1)
{while(length(list1) >= 1){
id <- evnt.df[ list1[ 1 ] , ]$IDT
idplus1 <- evnt.df[ ( list1[1] + 1) , ]$IDT
b <- which( evnt.df$IDT == id )
if( id == idplus1 ){
evnt.df <- evnt.df[- ( ( list1[1] + 1 ) : b[ length(b) ] ) , ]
}
list1 <- list1[-1]
} }
lapply(list(event1, event2, event3, event4), evnt.fn)
The principle is to make it work on one instance and then wrap it up:
fnname <- function(instance){substitute "instance" for the data object name}
I generally prefer using the "[[" version of "$" but in this instance I don't see a lot of risk in just leaving it in.
Upvotes: 1
Reputation: 6104
here's an example of how to loop through multiple data frames and run the same series of commands on all of them
# list containing multiple data frames
x <- list( mtcars , iris )
# some function you've defined
myfun <-
function( df ){
# find numeric variables
nv <- sapply( df , is.numeric )
# return the 10th and 90th percentile of each numeric column
sapply( df[ , nv ] , quantile , c( 0.1 , 0.9 ) )
}
# run the function across all data frames
lapply( x , myfun )
Upvotes: 1
Reputation: 17189
If your function is called myfunc
, then to apply it on objects event1
to event4
you can use
lapply(paste0('event',1:4), function(x) { eventDF <- get(x) ; myfunc(eventDF) })
Explanation:
paste0('event',1:4)
creates character vector of object names over which you want to apply function.
lapply
applies inline function over each elment of above character vector.
get(x)
returns object whose name is equal to x
Upvotes: 1