quelopes
quelopes

Reputation: 121

Extract specific numbers from string in R

I have this example:

> exemplo
V1   V2
local::/raiz/diretorio/adminadmin/    1
local::/raiz/diretorio/jatai_p_user/    2
local::/raiz/diretorio/adminteste/    3
local::/raiz/diretorio/adminteste2/    4
local::/raiz/diretorio/48808032191/    5
local::/raiz/diretorio/85236250110/    6
local::/raiz/diretorio/92564593100/    7
local::/raiz/diretorio/AACB/036/03643936451/  331
home::22723200159 3894
home::98476963300 3895
home::15239136149 3896
home::01534562567 3897

I would like extract just numbers with exact 11 characters (in first column), producing results like this one:

> exemplo
V1   V2
48808032191    5
85236250110    6
92564593100    7
03643936451   331
22723200159   3894
98476963300   3895
15239136149   3896
01534562567   3897

Any help would be great :-)

Upvotes: 0

Views: 92

Answers (4)

jbaums
jbaums

Reputation: 27388

Here's one way using stringr, where d is your dataframe:

library(stringr)
m <- str_extract(d$V1, '\\d{11}')
na.omit(data.frame(V1=m, V2=d$V2))

#             V1   V2
# 5  48808032191    5
# 6  85236250110    6
# 7  92564593100    7
# 8  03643936451  331
# 9  22723200159 3894
# 10 98476963300 3895
# 11 15239136149 3896
# 12 01534562567 3897

The approach above will match strings of at least 11 numerals. In response to @JoshO'Brien's comment, if you only want to match exactly 11 numerals, then you can use:

m <- str_extract(d$V1, perl('(?<!\\d)\\d{11}(?!\\d)'))

Upvotes: 3

Tyler Rinker
Tyler Rinker

Reputation: 109874

Here's how I'd approach it. This can be done in base R but stringi's consistency in naming makes it so easy to use not to mention it's fast. I'd store the 11 digits as a new column rather than overwrite the old one.

dat <- read.table(text="V1   V2
local::/raiz/diretorio/adminadmin/    1
local::/raiz/diretorio/jatai_p_user/    2
local::/raiz/diretorio/adminteste/    3
local::/raiz/diretorio/adminteste2/    4
local::/raiz/diretorio/48808032191/    5
local::/raiz/diretorio/85236250110/    6
local::/raiz/diretorio/92564593100/    7
local::/raiz/diretorio/AACB/036/03643936451/  331
home::22723200159 3894
home::98476963300 3895
home::15239136149 3896
home::01534562567 3897", header=TRUE)


library(stringi)
dat[["V3"]] <- unlist(stri_extract_all_regex(dat[["V1"]], "\\d{11}"))
dat[!is.na(dat[["V3"]]), 3:2]

##             V3   V2
## 5  48808032191    5
## 6  85236250110    6
## 7  92564593100    7
## 8  03643936451  331
## 9  22723200159 3894
## 10 98476963300 3895
## 11 15239136149 3896
## 12 01534562567 3897

Upvotes: 1

Roland
Roland

Reputation: 132706

DF <- read.table(text = "V1   V2
local::/raiz/diretorio/adminadmin/    1
local::/raiz/diretorio/jatai_p_user/    2
local::/raiz/diretorio/adminteste/    3
local::/raiz/diretorio/adminteste2/    4
local::/raiz/diretorio/48808032191/    5
local::/raiz/diretorio/85236250110/    6
local::/raiz/diretorio/92564593100/    7
local::/raiz/diretorio/AACB/036/03643936451/  331
home::22723200159 3894
home::98476963300 3895
home::15239136149 3896
home::01534562567 3897", header = TRUE)


pattern <- "\\d{11}"
m <- regexpr(pattern, DF$V1)
DF1 <- DF[attr(m, "match.length") > -1,]
DF1$V1<- regmatches(DF$V1, m)

#            V1   V2
#5  48808032191    5
#6  85236250110    6
#7  92564593100    7
#8  03643936451  331
#9  22723200159 3894
#10 98476963300 3895
#11 15239136149 3896
#12 01534562567 3897

Upvotes: 2

LauriK
LauriK

Reputation: 1929

The command you are looking for is grep(). The pattern to use there would be something like \d{11} or [0-9]{11}.

Upvotes: 0

Related Questions