Reputation: 798
I'm trying to extract the country codes and move them into a new column.
Example data
data <- data.frame(phone = c("+1 800 000 000", "+257000000000", "+91-00 000 00", "200000 000"))
I only have a start so far. For instance, I can extract the +
sign, but I'm trying to find how to detect +1 +257 +91
, etc..
data |>
mutate(country_code = str_extract(phone, "[:symbol:]"))
phone country_code
+1 800 000 000 +
+257000000000 +
+91-00 000 00 +
200000 000 NA
What I'm trying to achieve:
phone country_code
+1 800 000 000 +1
+257000000000 +257
+91-00 000 00 +91
200000 000 NA
I'm wondering if I can match possible country codes based on another vector where I specify the different variations, like this: codes <- c(1, 257, 91)
or like this codes <- c("+1", "+257", "+91")
.
Upvotes: 3
Views: 250
Reputation: 887621
Using base R
pat <- sprintf("\\+(%s)", paste(codes, collapse = "|"))
i1 <- grepl(pat, data$phone)
data$country_code[i1] <- regmatches(data$phone[i1], regexpr(pat, data$phone[i1]))
-output
> data
phone country_code
1 +1 800 000 000 +1
2 +257000000000 +257
3 +91-00 000 00 +91
4 200000 000 <NA>
Upvotes: 1
Reputation: 17550
Since +
is a special character, you have to add \\
to escape it. You can try searching for any of your pre-designated codes by first concatenating all of them using the "or " symbol (|
) then using the stringr
package's str_match
:
srch <- paste0("\\",paste(codes, collapse = "|\\"))
# [1] "\\+1|\\+257|\\+91"
stringr::str_match(data$phone, srch)
Output:
[,1]
[1,] "+1"
[2,] "+257"
[3,] "+91"
[4,] NA
Data
data <- data.frame(phone = c("+1 800 000 000", "+257000000000", "+91-00 000 00", "200000 000"))
codes <- c("+1", "+257", "+91")
Upvotes: 2
Reputation: 11596
Does this work:
library(dplyr)
library(stringr)
data %>% mutate(country_code = str_extract(phone, str_c('\\+', codes, collapse = '|')))
phone country_code
1 +1 800 000 000 +1
2 +257000000000 +257
3 +91-00 000 00 +91
4 200000 000 <NA>
Upvotes: 2