desired login
desired login

Reputation: 1190

Extracting number from the beginning of a string with regexp

I am trying to extract the number at the beginning of a string in R. I have tried this:

> tt <- "51 - TS - Data estimated - see comments"
> grep('^[0-9]+', tt, value=T)
[1] "51 - TS - Data estimated - see comments"

Why is it returning the whole string and not just the number?

Upvotes: 4

Views: 93

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269596

1) sub Try this which removes the first non-digit and everything thereafter:

> sub("\\D.*", "", tt)
[1] "51"

2) strsplit or this which splits on non-digits and takes the first such component:

> strsplit(tt, "\\D")[[1]][1]
[1] "51"

3) strapplyc or this which extracts the leading digits:

> library(gsubfn)
> strapplyc(tt, "^\\d+", simplify = TRUE)
[1] "51"

Upvotes: 2

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

grep returns either the position or the value (of the entire input) if a pattern is found.

Try gsub or gregexpr+regmatches instead:

gsub("(^[0-9]+).*", "\\1", tt)
# [1] "51"

x <- gregexpr("^[0-9]+", tt)
regmatches(tt, x)
# [[1]]
# [1] "51"

Upvotes: 4

Related Questions