user6579764
user6579764

Reputation:

How can I select two characters in a string?

I know that maybe is something of very easy to resolve but, looking for various example online, I did not find the right example to resolve my problem.

I have in a data.frame under a column a following phrase:

ID
p_IIJSJ;o_OODJ;l_jjjjw;g_jjjdI
p_HHDU;o_WWj;l_WWOJ;g_jjjDI

I would like to select two words: the one who start with p_ and the one who start with g_ and eliminate all the rest which is between them.... do you have any suggestion about how make it? I'm trying with gsub but with no success at the moment. Thank you a lot in advance

Upvotes: 0

Views: 147

Answers (3)

akrun
akrun

Reputation: 887621

We can use sub

sub(";*(p_\\w+).*;*(g_\\w+).*", "\\1;\\2", df1$ID)
#[1] "p_IIJSJ;g_jjjdI" "p_HHDU;g_jjjDI" 

Or with gsub

gsub("[^pg]_\\w+;", "", df1$ID)
#[1] "p_IIJSJ;g_jjjdI" "p_HHDU;g_jjjDI" 

data

df1 <- structure(list(ID = c("p_IIJSJ;o_OODJ;l_jjjjw;g_jjjdI", "p_HHDU;o_WWj;l_WWOJ;g_jjjDI"
)), .Names = "ID", class = "data.frame", row.names = c(NA, -2L))

Upvotes: 0

Samuel
Samuel

Reputation: 3053

I suggest you use package stringr which makes it easy:

library(stringr)

a <- "p_IIJSJ;o_OODJ;l_jjjjw;g_jjjdI"
b <- "p_HHDU;o_WWj;l_WWOJ;g_jjjDI"

str_extract(string = a, pattern = c("p_[a-zA-Z]+", "g_[a-zA-Z]+"))

# [1] "p_IIJSJ" "g_jjjdI"

str_extract(string = b, pattern = c("p_[a-zA-Z]+", "g_[a-zA-Z]+"))

# [1] "p_HHDU"  "g_jjjDI"

Upvotes: 1

Sotos
Sotos

Reputation: 51592

An approach with strrsplit,

sapply(strsplit(x, ';'), function(i) paste(grep('p_|g_', i, value = TRUE), collapse = ';'))
#[1] "p_IIJSJ;g_jjjdI"

or if the order is always the same (as @Jaap mentions)

sapply(strsplit(df$ID,';'), function(x) paste(x[c(1,4)], collapse=';'))

Upvotes: 2

Related Questions