Reputation:
I have a data frame ("GO") in R with 2 columns, "term" and "gene". The "term" column is of type character and has entries like this:
GO_POSITIVE_REGULATION_OF_VIRAL_TRANSCRIPTION
GO_CARGO_RECEPTOR
GO_MATRIX ...
So every column starts with GO_ and has _ between the words. I want to delete the GO_ and replace the other _ by spaces.
I tried to fix this with gsub:
GO$term <- gsub('GO', '', GO$term)
GO$term <- gsub('\\_', ' ', GO$term)
The problem is that for example GO_CARGO_RECEPTOR has become CAR RECEPTOR, but I need it to be CARGO RECEPTOR.
I don't know how it is possible to specify the code in R, so that in this example only the GO_ in the beginning and the _ in the middle of the strings are deleted...
Thanks for any help.
Upvotes: 0
Views: 1037
Reputation: 626835
Just in case you need to only replace _
with spaces in strings that start with a specific prefix and drop this prefix, too, you may use a PCRE regex based gsub
like
x <- c("GO_POSITIVE_REGULATION_OF_VIRAL_TRANSCRIPTION","POSITIVE_REGULATION_OF_VIRAL_TRANSCRIPTION")
gsub("(?:\\G(?!^)|^GO_)([^_]*)_", "\\1 ", x, perl=TRUE)
## => [1] "POSITIVE REGULATION OF VIRAL TRANSCRIPTION"
## [2] "POSITIVE_REGULATION_OF_VIRAL_TRANSCRIPTION"
See the R demo and the regex demo.
Regex details
(?:\G(?!^)|^GO_)
- A non-capturing group that matches either the end of the preceding match (\G(?!^)
) or (|
) the GO_
substring (prefix) at the start of a line([^_]*)
- Capturing group 1 (this value is referred to with \1
from the replacement pattern): any 0 or more chars other than _
_
- an underscore.Upvotes: 0
Reputation: 1599
With dplyr::mutate
plus some base
functions to do manipulation in the data frame.
library(dplyr)
GO <- GO %>%
dplyr::mutate(term = base::substring(term, 4), # remove GO_
term = base::gsub("_", " ", term))
> GO
term gene
1 POSITIVE REGULATION OF VIRAL TRANSCRIPTION 0.507617
2 CARGO RECEPTOR 0.991978
3 MATRIX 0.543001
GO <- data.frame(term = c("GO_POSITIVE_REGULATION_OF_VIRAL_TRANSCRIPTION",
"GO_CARGO_RECEPTOR",
"GO_MATRIX"),
gene = runif(3))
Upvotes: 0
Reputation: 2867
x <- "GO_CARGO_RECEPTOR"
gsub("_", " ", sub("^GO_", "", x))
[1] "CARGO RECEPTOR"
Just use sub
instead of gsub
for the "GO_"
and gsub
for the rest.
Upvotes: 1