Reputation:
I know that maybe is something of very easy to resolve but, looking for various example online, I did not find the right example to resolve my problem.
I have in a data.frame under a column a following phrase:
ID
p_IIJSJ;o_OODJ;l_jjjjw;g_jjjdI
p_HHDU;o_WWj;l_WWOJ;g_jjjDI
I would like to select two words: the one who start with p_ and the one who start with g_ and eliminate all the rest which is between them.... do you have any suggestion about how make it? I'm trying with gsub
but with no success at the moment.
Thank you a lot in advance
Upvotes: 0
Views: 147
Reputation: 887621
We can use sub
sub(";*(p_\\w+).*;*(g_\\w+).*", "\\1;\\2", df1$ID)
#[1] "p_IIJSJ;g_jjjdI" "p_HHDU;g_jjjDI"
Or with gsub
gsub("[^pg]_\\w+;", "", df1$ID)
#[1] "p_IIJSJ;g_jjjdI" "p_HHDU;g_jjjDI"
df1 <- structure(list(ID = c("p_IIJSJ;o_OODJ;l_jjjjw;g_jjjdI", "p_HHDU;o_WWj;l_WWOJ;g_jjjDI"
)), .Names = "ID", class = "data.frame", row.names = c(NA, -2L))
Upvotes: 0
Reputation: 3053
I suggest you use package stringr
which makes it easy:
library(stringr)
a <- "p_IIJSJ;o_OODJ;l_jjjjw;g_jjjdI"
b <- "p_HHDU;o_WWj;l_WWOJ;g_jjjDI"
str_extract(string = a, pattern = c("p_[a-zA-Z]+", "g_[a-zA-Z]+"))
# [1] "p_IIJSJ" "g_jjjdI"
str_extract(string = b, pattern = c("p_[a-zA-Z]+", "g_[a-zA-Z]+"))
# [1] "p_HHDU" "g_jjjDI"
Upvotes: 1
Reputation: 51592
An approach with strrsplit
,
sapply(strsplit(x, ';'), function(i) paste(grep('p_|g_', i, value = TRUE), collapse = ';'))
#[1] "p_IIJSJ;g_jjjdI"
or if the order is always the same (as @Jaap mentions)
sapply(strsplit(df$ID,';'), function(x) paste(x[c(1,4)], collapse=';'))
Upvotes: 2