dafyddPrys
dafyddPrys

Reputation: 918

Applying a function to more than one data frame (R)

I have numerous data frames for which I would like to apply the same function.

Context: I have data frames which record time windows for subjects, with an indicator which is 0/1, saying if an even occured in that time window. An example:

ID start stop event
1  0     12   0
1  12    24   0 
1  24    36   1
1  36    48   1
2  0     12   0 

etc. What I have is a function which deletes every entry after the first event for each id, for example, the record for ID = 1, start time = 36 in the above example.

The code for one dataset is: (the dataset is called event1 , IDT is ID)

list1 <- which(event1$event == 1)

while(length(list1) >= 1){

  id <- event1[ list1[ 1 ] , ]$IDT
  idplus1 <- event1[ ( list1[1] + 1) , ]$IDT
  b <- which( event1$IDT == id )

  if( id == idplus1 ){ 

     event1 <- event1[- ( ( list1[1] + 1 ) : b[ length(b) ] ) , ]   
   }

 list1 <- list1[-1]  

}

Now if I have four datasets; event1, event2, event3, event4, and I want to apply this function to each of those, is there a way to define a function to do this? I feel like there should be an opportunity to use lapply here...

Upvotes: 0

Views: 282

Answers (5)

Brian Diggs
Brian Diggs

Reputation: 58825

I'm going to start by cleaning up your example code because I could not get it to run without error on your example. Whenever you have something that you are doing for each value of some variable ("for each ID", in this case), you are looking at a split-apply-combine problem. My preferred tool for such problems is the plyr package, but it is not the only one. I would re-write your procedure as:

library("plyr")
ddply(event1, .(ID), function(DF) {
  firstevent <- which(DF$event == 1)[1]
  DF[seq(length=firstevent),]
})

This takes event1, splits it up by unique values of ID, and, for each of those, keeps only the records up to the first event.

This can be wrapped in a function easily.

truncevent <- function(event1) {
  ddply(event1, .(ID), function(DF) {
    firstevent <- which(DF$event == 1)[1]
    DF[seq(length=firstevent),]
  })
}

With an expanded event1:

event1 <- read.table(text=
"ID start stop event
1  0     12   0
1  12    24   0 
1  24    36   1
1  36    48   1
2  0     12   0
2  12    24   1
2  24    36   1", header=TRUE)

we get

> truncevent(event1)
  ID start stop event
1  1     0   12     0
2  1    12   24     0
3  1    24   36     1
4  2     0   12     0
5  2    12   24     1

Now we get to the part of your question about iterating over multiple data sets. One approach is to create a vector of data set names, iterate over that, and run the function on each of these.

events <- c("event1", "event2", "event3", "event4")
lapply(events, function(event) {
  truncevent(get(event))
})

More natural is to put the events into a list themselves rather than having to keep track of each of the names separately. Then iterating over this list is even simpler.

events <- list(event1, event2, event3, event4)
lapply(events, truncevent)

Both of these approaches will give you back a list of data.frames which are the transformed versions.

Upvotes: 0

Tyler Rinker
Tyler Rinker

Reputation: 109864

Here's how I would approach your problem:

Creating a data set (list of dataframes)

dat1 <- read.table(text="ID start stop event
1  0     12   0
1  12    24   0 
1  24    36   1
1  36    48   1
2  12    24   0 
2  24    36   1
2  36    48   1
3  0     12   0", header=TRUE)

dat2 <- dat3 <- dat1
dats <- list(dat1, dat2, dat3)

Applying a function to a list of dataframes

#Function to select up to first 1
FUN <- function(x) {
    splitx <- split(x, x$ID)
    out <- do.call(rbind, lapply(splitx, function(x) {
        inds <- c(which(x$event == 0), which(x$event == 1)[1])
        na.omit(x[inds, ])
    }))
    data.frame(out, row.names=NULL)
}

#apply it to all in list
lapply(dats, FUN)

Upvotes: 1

IRTFM
IRTFM

Reputation: 263342

Untested:

evnt.fn <- function(evnt.df)
  list1 <- which(evnt.df$event == 1)
  {while(length(list1) >= 1){
    id <- evnt.df[ list1[ 1 ] , ]$IDT
    idplus1 <- evnt.df[ ( list1[1] + 1) , ]$IDT
    b <- which( evnt.df$IDT == id )
    if( id == idplus1 ){ 
       evnt.df <- evnt.df[- ( ( list1[1] + 1 ) : b[ length(b) ] ) , ]   
     }
   list1 <- list1[-1]  
  } }

lapply(list(event1, event2, event3, event4), evnt.fn)

The principle is to make it work on one instance and then wrap it up:

fnname <- function(instance){substitute "instance" for the data object name}

I generally prefer using the "[[" version of "$" but in this instance I don't see a lot of risk in just leaving it in.

Upvotes: 1

Anthony Damico
Anthony Damico

Reputation: 6104

here's an example of how to loop through multiple data frames and run the same series of commands on all of them

# list containing multiple data frames
x <- list( mtcars , iris )

# some function you've defined
myfun <-
    function( df ){

        # find numeric variables
        nv <- sapply( df , is.numeric )

        # return the 10th and 90th percentile of each numeric column
        sapply( df[ , nv ] , quantile , c( 0.1 , 0.9 ) )

    }

# run the function across all data frames
lapply( x , myfun )

Upvotes: 1

CHP
CHP

Reputation: 17189

If your function is called myfunc, then to apply it on objects event1 to event4 you can use

lapply(paste0('event',1:4), function(x)  { eventDF <- get(x) ; myfunc(eventDF)   })

Explanation:

paste0('event',1:4) creates character vector of object names over which you want to apply function.

lapply applies inline function over each elment of above character vector.

get(x) returns object whose name is equal to x

Upvotes: 1

Related Questions