sjdh
sjdh

Reputation: 4006

Matching regular expressions to any of the columns in a dataframe

From a dataframe I want to subset all rows that contain some pattern like "A" or "36" or "1?2". I don't care which column matches the pattern, as long as there is a match somewhere in that row.

Dataframe:

aName   bName   pName   call  alleles   logRatio    strength
AX-11086564 F08_ADN103  2011-02-10_R10  AB  CG  0.363371    10.184215
AX-11086564 A01_CD1919  2011-02-24_R11  BB  GG  -1.352707   9.54909
AX-11086564 B05_CD2920  2011-01-27_R6   AB  CG  -0.183802   9.766334
AX-11086564 D04_CD5950  2011-02-09_R9   AB  CG  0.162586    10.165051
AX-11086564 D07_CD6025  2011-02-10_R10  AB  CG  -0.397097   9.940238
AX-11086564 B05_CD3630  2011-02-02_R7   AA  CC  2.349906    9.153076
AX-11086564 D04_ADN103  2011-02-10_R2   BB  GG  -1.898088   9.872966
AX-11086564 A01_CD2588  2011-01-27_R5   BB  GG  -1.208094   9.239801

My actual data frame contains many rows, and I don't want to hard code their names. The patterns can be more complicated, so I want to use regular expressions.

Code to read in this dataframe in R:

data <- read.table(textConnection("
aName   bName   pName   call  alleles   logRatio    strength
AX-11086564 F08_ADN103  2011-02-10_R10  AB  CG  0.363371    10.184215
AX-11086564 A01_CD1919  2011-02-24_R11  BB  GG  -1.352707   9.54909
AX-11086564 B05_CD2920  2011-01-27_R6   AB  CG  -0.183802   9.766334
AX-11086564 D04_CD5950  2011-02-09_R9   AB  CG  0.162586    10.165051
AX-11086564 D07_CD6025  2011-02-10_R10  AB  CG  -0.397097   9.940238
AX-11086564 B05_CD3630  2011-02-02_R7   AA  CC  2.349906    9.153076
AX-11086564 D04_ADN103  2011-02-10_R2   BB  GG  -1.898088   9.872966
AX-11086564 A01_CD2588  2011-01-27_R5   BB  GG  -1.208094   9.239801
"), header = TRUE)

Upvotes: 0

Views: 98

Answers (2)

agstudy
agstudy

Reputation: 121568

Here I define a wrapper of grep to serach in a data.frame:

search_data_frame <- 
  function(patt,data)
    unlist(lapply (seq_len(nrow(data)),function(i) grep(patt,data[i,])))

Then you use it :

  data[search_data_frame('36',data),]

        aName      bName          pName call alleles  logRatio strength
6 AX-11086564 B05_CD3630  2011-02-02_R7   AA      CC  2.349906 9.153076
2 AX-11086564 A01_CD1919 2011-02-24_R11   BB      GG -1.352707 9.549090

Note the I read your data using stringsAsFactors=FALSE otherwise you should coerce your factors to characters before. `

Upvotes: 2

jdharrison
jdharrison

Reputation: 30425

You can use grepl apply and rowSums

> rowSums(apply(data, 2, grepl, pattern = "A")) > 0
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> rowSums(apply(data, 2, grepl, pattern = "1?2")) > 0
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> rowSums(apply(data, 2, grepl, pattern = "36")) > 0
[1]  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE

> out <- rowSums(apply(data, 2, grepl, pattern = "36")) > 0
> data[out,]
        aName      bName          pName call alleles logRatio  strength
1 AX-11086564 F08_ADN103 2011-02-10_R10   AB      CG 0.363371 10.184215
6 AX-11086564 B05_CD3630  2011-02-02_R7   AA      CC 2.349906  9.153076

Note apply will coerce by as.vector

Upvotes: 2

Related Questions