Timothy Harding
Timothy Harding

Reputation: 377

In R, use regular expression to match multiple patterns and add new column to list

I've found numerous examples of how to match and update an entire list with one pattern and one replacement, but what I am looking for now is a way to do this for multiple patterns and multiple replacements in a single statement or loop.

Example:

> print(recs)
  phonenumber amount
1     5345091    200
2     5386052    200
3     5413949    600
4     7420155    700
5     7992284    600

I would like to insert a new column called 'service_provider' with /^5/ as Company1 and /^7/ as Company2.

I can do this with the following two lines of R:

recs$service_provider[grepl("^5", recs$phonenumber)]<-"Company1"
recs$service_provider[grepl("^7", recs$phonenumber)]<-"Company2"

Then I get:

  phonenumber amount service_provider
1     5345091    200          Company1
2     5386052    200          Company1
3     5413949    600          Company1
4     7420155    700          Company2
5     7992284    600          Company2

I'd like to provide a list, rather than discrete set of grepl's so it is easier to keep country specific information in one place, and all the programming logic in another.

thisPhoneCompanies<-list(c('^5','Company1'),c('^7','Company2'))

In other languages I would use a for loop on on the Phone Company list

For every row in thisPhoneCompanies
    Add service provider to matched entries in recs (such as the grepl statement)
end loop

But I understand that isn't the way to do it in R.

Upvotes: 5

Views: 1048

Answers (2)

Dominic Comtois
Dominic Comtois

Reputation: 10401

Using stringi :

library(stringi)
recs$service_provider <- stri_replace_all_regex(str = recs$phonenumber,
                                        pattern = c('^5.*','^7.*'), 
                                        replacement = c('Company1', 'Company2'),
                                        vectorize_all = FALSE)

recs
#   phonenumber amount service_provider
# 1     5345091    200         Company1
# 2     5386052    200         Company1
# 3     5413949    600         Company1
# 4     7420155    700         Company2
# 5     7992284    600         Company2

Upvotes: 4

Timothy Harding
Timothy Harding

Reputation: 377

Thanks to @thelatemail

Looks like if I use a dataframe instead of a list for the phone companies:

phcomp <- data.frame(ph=c(5,7),comp=c("Company1","Company2")) 

I can match and add a new column to my list of phone numbers in a single command (using the match function).

recs$service_provider <- phcomp$comp[match(substr(recs$phonenumber,1,1), phcomp$ph)]

Looks like I lose the ability to use regular expressions, but the matching here is very simple, just the first digit of the phone number.

Upvotes: 0

Related Questions