I have strings like this: df [1] "XID\t5647: asasaasa" "XID\t1540" how could I pick only numbers after "XID\t" by using gsub? I've used gsub as follows: gsub(".*XID\t(.*)\\:.*", "\\1", df) >[1] "5647" "XID\t1540" or gsub(".*XID\t(.*)", "\\1", df) >[1] "5647: asasaasa" "1540" but I expect it like this: [1] "5647" "1540" I think cases are overlap, so I have to use gsub twice so that I could pick them like I want. Please give me your idea, thank you.

Reputation: 31

How could I pick the string by gsub?

I have strings like this:

df
[1] "XID\t5647: asasaasa" "XID\t1540"

how could I pick only numbers after "XID\t" by using gsub? I've used gsub as follows:

gsub(".*XID\t(.*)\\:.*", "\\1", df)
>[1] "5647"     "XID\t1540"

gsub(".*XID\t(.*)", "\\1", df)
>[1] "5647: asasaasa" "1540"

but I expect it like this:

[1] "5647" "1540"

I think cases are overlap, so I have to use gsub twice so that I could pick them like I want. Please give me your idea, thank you.

Upvotes: 0

Answers (2)

randomSampling

Reputation: 136

Just replace anything that is not a digit with ""

x=c("XID\t5647: asasaasa", "XID\t1540" )
gsub("[^0-9]","",x)
#[1] "5647" "1540"

Upvotes: 1

akrun

Reputation: 887651

We can use str_extract to match the numeric part (\\d+)

library(stringr)
str_extract(df, "\\d+")
#[1] "5647" "1540"

Or with gsub to match all non-numeric (\\D+) and replace it with "".

gsub("\\D+", "", df)
#[1] "5647" "1540"

Or using the OP's syntax to match one or more numeric (\\d+) that follows the "XID\t", capture it as a group ((...)) and replace it with the backreference (\\1).

sub(".*XID\t(\\d+).*", "\\1", df)
#[1] "5647" "1540"

data

df <- c("XID\t5647: asasaasa", "XID\t1540" )

Upvotes: 4

How could I pick the string by gsub?

Answers (2)

data

Related Questions