Reputation: 121
I have this example:
> exemplo
V1 V2
local::/raiz/diretorio/adminadmin/ 1
local::/raiz/diretorio/jatai_p_user/ 2
local::/raiz/diretorio/adminteste/ 3
local::/raiz/diretorio/adminteste2/ 4
local::/raiz/diretorio/48808032191/ 5
local::/raiz/diretorio/85236250110/ 6
local::/raiz/diretorio/92564593100/ 7
local::/raiz/diretorio/AACB/036/03643936451/ 331
home::22723200159 3894
home::98476963300 3895
home::15239136149 3896
home::01534562567 3897
I would like extract just numbers with exact 11 characters (in first column), producing results like this one:
> exemplo
V1 V2
48808032191 5
85236250110 6
92564593100 7
03643936451 331
22723200159 3894
98476963300 3895
15239136149 3896
01534562567 3897
Any help would be great :-)
Upvotes: 0
Views: 92
Reputation: 27388
Here's one way using stringr
, where d
is your dataframe:
library(stringr)
m <- str_extract(d$V1, '\\d{11}')
na.omit(data.frame(V1=m, V2=d$V2))
# V1 V2
# 5 48808032191 5
# 6 85236250110 6
# 7 92564593100 7
# 8 03643936451 331
# 9 22723200159 3894
# 10 98476963300 3895
# 11 15239136149 3896
# 12 01534562567 3897
The approach above will match strings of at least 11 numerals. In response to @JoshO'Brien's comment, if you only want to match exactly 11 numerals, then you can use:
m <- str_extract(d$V1, perl('(?<!\\d)\\d{11}(?!\\d)'))
Upvotes: 3
Reputation: 109874
Here's how I'd approach it. This can be done in base R but stringi's consistency in naming makes it so easy to use not to mention it's fast. I'd store the 11 digits as a new column rather than overwrite the old one.
dat <- read.table(text="V1 V2
local::/raiz/diretorio/adminadmin/ 1
local::/raiz/diretorio/jatai_p_user/ 2
local::/raiz/diretorio/adminteste/ 3
local::/raiz/diretorio/adminteste2/ 4
local::/raiz/diretorio/48808032191/ 5
local::/raiz/diretorio/85236250110/ 6
local::/raiz/diretorio/92564593100/ 7
local::/raiz/diretorio/AACB/036/03643936451/ 331
home::22723200159 3894
home::98476963300 3895
home::15239136149 3896
home::01534562567 3897", header=TRUE)
library(stringi)
dat[["V3"]] <- unlist(stri_extract_all_regex(dat[["V1"]], "\\d{11}"))
dat[!is.na(dat[["V3"]]), 3:2]
## V3 V2
## 5 48808032191 5
## 6 85236250110 6
## 7 92564593100 7
## 8 03643936451 331
## 9 22723200159 3894
## 10 98476963300 3895
## 11 15239136149 3896
## 12 01534562567 3897
Upvotes: 1
Reputation: 132706
DF <- read.table(text = "V1 V2
local::/raiz/diretorio/adminadmin/ 1
local::/raiz/diretorio/jatai_p_user/ 2
local::/raiz/diretorio/adminteste/ 3
local::/raiz/diretorio/adminteste2/ 4
local::/raiz/diretorio/48808032191/ 5
local::/raiz/diretorio/85236250110/ 6
local::/raiz/diretorio/92564593100/ 7
local::/raiz/diretorio/AACB/036/03643936451/ 331
home::22723200159 3894
home::98476963300 3895
home::15239136149 3896
home::01534562567 3897", header = TRUE)
pattern <- "\\d{11}"
m <- regexpr(pattern, DF$V1)
DF1 <- DF[attr(m, "match.length") > -1,]
DF1$V1<- regmatches(DF$V1, m)
# V1 V2
#5 48808032191 5
#6 85236250110 6
#7 92564593100 7
#8 03643936451 331
#9 22723200159 3894
#10 98476963300 3895
#11 15239136149 3896
#12 01534562567 3897
Upvotes: 2
Reputation: 1929
The command you are looking for is grep()
. The pattern to use there would be something like \d{11}
or [0-9]{11}
.
Upvotes: 0