writer_typer
writer_typer

Reputation: 798

How to extract country codes from phone number?

I'm trying to extract the country codes and move them into a new column.

Example data

data <- data.frame(phone = c("+1 800 000 000", "+257000000000", "+91-00 000 00", "200000 000"))

I only have a start so far. For instance, I can extract the + sign, but I'm trying to find how to detect +1 +257 +91, etc..

data |> 
  mutate(country_code = str_extract(phone, "[:symbol:]"))
phone            country_code
+1 800 000 000      +           
+257000000000       +           
+91-00 000 00       +           
200000 000          NA

What I'm trying to achieve:

phone            country_code
+1 800 000 000      +1          
+257000000000       +257            
+91-00 000 00       +91         
200000 000           NA

I'm wondering if I can match possible country codes based on another vector where I specify the different variations, like this: codes <- c(1, 257, 91) or like this codes <- c("+1", "+257", "+91").

Upvotes: 3

Views: 250

Answers (3)

akrun
akrun

Reputation: 887621

Using base R

pat <- sprintf("\\+(%s)", paste(codes, collapse = "|"))
i1 <- grepl(pat, data$phone)
data$country_code[i1] <-  regmatches(data$phone[i1], regexpr(pat, data$phone[i1]))

-output

> data
           phone country_code
1 +1 800 000 000           +1
2  +257000000000         +257
3  +91-00 000 00          +91
4     200000 000         <NA>

Upvotes: 1

jpsmith
jpsmith

Reputation: 17550

Since + is a special character, you have to add \\ to escape it. You can try searching for any of your pre-designated codes by first concatenating all of them using the "or " symbol (|) then using the stringr package's str_match:

srch <- paste0("\\",paste(codes, collapse = "|\\"))
# [1] "\\+1|\\+257|\\+91"

stringr::str_match(data$phone, srch)

Output:

     [,1]  
[1,] "+1"  
[2,] "+257"
[3,] "+91" 
[4,] NA 

Data

data <- data.frame(phone = c("+1 800 000 000", "+257000000000", "+91-00 000 00", "200000 000"))
codes <- c("+1", "+257", "+91")

Upvotes: 2

Karthik S
Karthik S

Reputation: 11596

Does this work:

library(dplyr)
library(stringr)

data %>% mutate(country_code = str_extract(phone, str_c('\\+', codes, collapse = '|')))
           phone country_code
1 +1 800 000 000           +1
2  +257000000000         +257
3  +91-00 000 00          +91
4     200000 000         <NA>

Upvotes: 2

Related Questions