Reputation: 31
I have strings like this:
df
[1] "XID\t5647: asasaasa" "XID\t1540"
how could I pick only numbers after "XID\t" by using gsub? I've used gsub as follows:
gsub(".*XID\t(.*)\\:.*", "\\1", df)
>[1] "5647" "XID\t1540"
or
gsub(".*XID\t(.*)", "\\1", df)
>[1] "5647: asasaasa" "1540"
but I expect it like this:
[1] "5647" "1540"
I think cases are overlap, so I have to use gsub twice so that I could pick them like I want. Please give me your idea, thank you.
Upvotes: 0
Views: 90
Reputation: 136
Just replace anything that is not a digit with ""
x=c("XID\t5647: asasaasa", "XID\t1540" )
gsub("[^0-9]","",x)
#[1] "5647" "1540"
Upvotes: 1
Reputation: 887651
We can use str_extract
to match the numeric part (\\d+
)
library(stringr)
str_extract(df, "\\d+")
#[1] "5647" "1540"
Or with gsub
to match all non-numeric (\\D+
) and replace it with ""
.
gsub("\\D+", "", df)
#[1] "5647" "1540"
Or using the OP's syntax to match one or more numeric (\\d+
) that follows the "XID\t", capture it as a group ((...)
) and replace it with the backreference (\\1
).
sub(".*XID\t(\\d+).*", "\\1", df)
#[1] "5647" "1540"
df <- c("XID\t5647: asasaasa", "XID\t1540" )
Upvotes: 4