user3141121
user3141121

Reputation: 490

Extracting part of column if pattern found in a data.table

I have a data.table and I would like to put into a vector part of a column of a pattern is found from other columns. For example I have the following data.table:

library(data.table)
df <- fread('./file')
df

        V1   V2  V3 V4 V5      V6 V7 V8 V9
   1:    0 -148 -49 -1  X     CAT  5  0 NA
   2:    1 -147 -49  5  X     FOT  12 0 NA
   3:    2 -146 -49  3  X     FAT  53 0 NA
   4:    3 -145 -48 -2  X     BYE  10 0 NA
   5:    4 -144 -48  0  X     GOO  2  0 NA

I want to extract the values from V7 that exist between a set of patterns and to put the values from V7 into a vector.

The starting patterns are these:

V2 == -147 & V4 == 5 & V6 == 'FOT'

The ending patterns are these:

V4 == -2 & V6 == 'BYE' 

If these patterns are found, then extract the values from V7 between them. So 12,53,10 should be put into a vector (x).

Upvotes: 0

Views: 125

Answers (2)

Arun
Arun

Reputation: 118889

One way I could think of is to use which=TRUE:

start = DT[V2 == -147 & V4 == 5 & V6=='FOT', which=TRUE] ## [1] 2L
end   = DT[V4 == -2 & V6=='BYE', which=TRUE] ## [2] 4L

DT[start:end, V7]
# [1] 12 53 10

Note that if there are multiple matches, then all indices will be returned. You might want to pick the corresponding start and end values. Also is the case where a pattern doesn't return any match. I'll leave it to you to iron out these edge cases.

Upvotes: 2

talat
talat

Reputation: 70336

This should do it:

n <- min(which((df$V2 == -147 & df$V4 == 5 & df$V6 =='FOT') == TRUE)) #determine the start

m <- max(which(df$V4 == -2 & df$V6 == 'BYE'))   #determine the end

x <- df$V7[n:m]

>x
#[1] 12 53 10

Upvotes: 1

Related Questions