Reputation: 1814
I want to extract only the content inside the parenthesis, this works pretty well if I use vectors (example below):
j<-"[8] Q(+.98)"
gsub("[\\(\\)]", "", regmatches(j, gregexpr("\\(.*?\\)", j))[[1]])
Now I want to to use apply to run the above code in multiple columns in a dataframe. Below is what I have done which does not work.
a<-c("[7] C(+57.02)", "[11] C(+57.02)", NA, NA)
b<- c("[16] C(+57.02)", NA, NA,NA)
c<-c("[9] Q(+.98)" , "[13] Q(+.98)" , "[14] C(+57.02)",NA)
abc<-as.data.frame(rbind(a,b,c))
abc_in<-apply(abc, 2, function(x)
gsub("[\\(\\)]", "", regmatches(x, gregexpr("\\(.*?\\)", x))[[1]]))
Upvotes: 0
Views: 57
Reputation: 99361
You don't need apply()
or any packages. Since we're operating on the entire data frame, we can first coerce it to a matrix then just use sub()
.
sub(".*\\((.+)\\).*", "\\1", as.matrix(abc))
# V1 V2 V3 V4
# a "+57.02" "+57.02" NA NA
# b "+57.02" NA NA NA
# c "+.98" "+.98" "+57.02" NA
That gives you a matrix back. If you need to retain the data frame structure, then
abc[] <- sub(".*\\((.+)\\).*", "\\1", as.matrix(abc))
Of course, you could loop the data frame columns. But for that I would go with lapply()
over apply()
since a data frame is a list.
abc[] <- lapply(abc, sub, pattern = ".*\\((.+)\\).*", replacement = "\\1")
Coercion is done implicitly by sub()
, so starting with factors is not an issue.
Upvotes: 2
Reputation: 5673
It does what you tell him, that is to take only the first element of the regmatches list for each column. I advise to use str_extract
from stringr package, that gives a vector as a result and is easier to write and use:
library(stringr)
abs_in <- apply(abc,2,function(x){ gsub("[\\(\\)]", "",str_extract(x,"\\(.*?\\)"))})
> abs_in
V1 V2 V3 V4
[1,] "+57.02" "+57.02" NA NA
[2,] "+57.02" NA NA NA
[3,] "+.98" "+.98" "+57.02" NA
Upvotes: 1
Reputation: 43179
Using stringr
, additionally, you need to specify stringsAsFactors = FALSE
when binding the dataframe:
abc<-as.data.frame(rbind(a,b,c), stringsAsFactors = FALSE)
library(stringr)
regex <- "\\(([^()]+)\\)"
str_match_all(abc, regex)
This yields
[[1]]
[,1] [,2]
[1,] "(+57.02)" "+57.02"
[2,] "(+57.02)" "+57.02"
[3,] "(+.98)" "+.98"
[[2]]
[,1] [,2]
[1,] "(+57.02)" "+57.02"
[2,] "(+.98)" "+.98"
[[3]]
[,1] [,2]
[1,] "(+57.02)" "+57.02"
[[4]]
[,1] [,2]
[1,] "(NA, NA, NA)" "NA, NA, NA"
Just always take the second group.
Upvotes: 0