useR
useR

Reputation: 101

Subsetting value when condition is met

I have a data frame which holds the times of random events occurring. What I want, is to subset the first case when either 'place' or 'Show' appears under Event, combined with 'kick' or 'Type' appearing under Event 2. So in this case, 'place run' wouldn't satisfy the condition, even though 'place' does appear under 'Event'. When I say the first case, I only want the first case when either of those options occur before the time resets back to 0. So for the first segment, the output I would want is 27, as this is the first time value when the condition is satisfied. For the second segment, I would want 16. For the last segment, the output would be 41. (I've put asterisk surrounding the rows which meet the condition so its easy to locate them. This isn't actually present in the data.)

Time Event  Event 2
 0   Begin   NA
 23  place   run
 27  *Show   Type*
 34  *place  kick*
 41  good    bye
 42  *place  kick*
 0   Begin   NA
 11  Hat     Yellow
 13  Show    Green
 16  *place  kick*
 20  place   hit
 29  sign    redeem
 35  *Show   Type*
 0   Begin   NA
 5   Cream   Glue
 17  Show    Green
 18  Orange  Screen
 30  place   hit
 33  sign    redeem
 41  *Show   Type*
  0  Begin   NA
 ...

EDIT : So far, what I'm able to do, is subset the rows that have Show Type or place kick with the following code :

Rows <- Data[(Data[,'Event'] == 'Show' & Data[,'Event 2']== 'Type') |
                  (Data[,'Event'] == 'place' & Data[,'Event 2']== 'kick' ),]

Where I'm struggling, is being able to reset the search for these values after Time resets back to 0. Any help will be greatly appreciated!

Upvotes: 0

Views: 124

Answers (2)

IRTFM
IRTFM

Reputation: 263332

The &-infix-function can be wrapped with the which function to generate a vector of the row numbers where those conditions are met. Then follow that with [1] to get just the first one.

df[ which(df[ , 'Event'] %in% c('place','Show') & df[ ,'Event.2'] %in% c('kick','Type') )[1], ]

Notice that I didn't leave a space between Event and 2, since that would have been parsed by R as two differnt symbols. The make.names-function is used by all the read.* functions to remove invalid punctuation from column names.

To make this process reset at each new segment, you would build a segment vector probably with something like segvec= cumsum(df$Time==0), and then probably use the split-apply-combine approach to get values just within the resulting subsets.

Some lightly test code:

 lapply( split(dat, cumsum(dat[ ,'Time']==0)), 
      function(df){df[ which(df[ ,'Event'] %in% c('place','Show') & 
                             df[ ,'Event.2'] %in% c('kick','Type') )[1], ]})
#------
$`1`
  Time Event Event.2
3   27  Show    Type

$`2`
   Time Event Event.2
10   16 place    kick

$`3`
   Time Event Event.2
20   41  Show    Type

dput(dat)
structure(list(Time = c(0L, 23L, 27L, 34L, 41L, 42L, 0L, 11L, 
13L, 16L, 20L, 29L, 35L, 0L, 5L, 17L, 18L, 30L, 33L, 41L), Event = structure(c(1L, 
6L, 7L, 6L, 3L, 6L, 1L, 4L, 7L, 6L, 6L, 8L, 7L, 1L, 2L, 7L, 5L, 
6L, 8L, 7L), .Label = c("Begin", "Cream", "good", "Hat", "Orange", 
"place", "Show", "sign"), class = "factor"), Event.2 = structure(c(NA, 
7L, 9L, 5L, 1L, 5L, NA, 10L, 3L, 5L, 4L, 6L, 9L, NA, 2L, 3L, 
8L, 4L, 6L, 9L), .Label = c("bye", "Glue", "Green", "hit", "kick", 
"redeem", "run", "Screen", "Type", "Yellow"), class = "factor")), .Names = c("Time", 
"Event", "Event.2"), class = "data.frame", row.names = c(NA, 
-20L))

Upvotes: 3

hrbrmstr
hrbrmstr

Reputation: 78792

Far less succinct (and prbly less optimal) than 42-'s but:

library(stringi)

read.table(text="Time Event  Event2
 0   Begin   NA
 23  place   run)
 27  *Show   Type*
 34  (*place  kic)k*
 41  good    bye
 42  (*place  kic)k*
 0   Begin   NA
 11  Hat     Yellow
 13  Show    Green
 16  *place  kick*
 20  place   hit
 29  sign    redeem
 35  *Show   Type*
 0   Begin   NA
 5   Cream   Glue
 17  Show    Green
 18  Orange  Screen
 30  place   hit
 33  sign    redeem
 41  *Show   Type*
  0  Begin   NA", header=TRUE, stringsAsFactors=FALSE) -> df

library(dplyr)

df$grp <- 0
df[which(df$Time == 0),]$grp <- 1
df$grp <- cumsum(df$grp)

group_by(df, grp) %>%
  filter(grepl("place|show", Event, ignore.case=TRUE) & grepl("kick|type", Event2, ignore.case=TRUE)) %>%
  slice(1) %>%
  select(-grp)
## Source: local data frame [3 x 4]
## Groups: grp [3]
## 
##     grp  Time  Event Event2
##   <dbl> <int>  <chr>  <chr>
## 1     1    27  *Show  Type*
## 2     2    16 *place  kick*
## 3     3    41  *Show  Type*

Upvotes: 0

Related Questions