Reputation: 1
in a data.table I have a column with company names that sometimes include the city of that company. Based on a vector of all existing cities I would like to detect if a city name is part of the company name and if yes extract the city into a new column. I used a for loop that loops trough every row of my data.table over all cities within my vector of cities in R. This takes a very long time. Is there a way I can vectorize this operation to make it more efficient computationally.
Company_name | Location |
---|---|
Company 1 Berlin Gmbh. | NA |
Dresden Company 2 Gmbh. | NA |
Company 3 in Hamburg | NA |
Company 4 Ldt | NA |
Company_name | Location |
---|---|
Company 1 Berlin Gmbh. | Berlin |
Dresden Company 2 Gmbh. | Dresden |
Company 3 in Hamburg | Hamburg |
Company 4 Ldt | NA |
Upvotes: 0
Views: 435
Reputation: 24722
df[, city:=stringr::str_extract(Company, paste0(cities,collapse = "|"))]
OR
# this also works
df[, city:=cities[sapply(cities, \(x) grepl(x,Company))], by=1:nrow(df)]
Output:
Company city
1: Company 1 Berlin Gmbh. Berlin
2: Dresden Company 2 Gmbh. Dresden
3: Company 3 in Hamburg Hamburg
4: Company 4 Ldt <NA>
Input:
library(data.table)
df =data.table(
Company = c(
"Company 1 Berlin Gmbh.",
"Dresden Company 2 Gmbh.",
"Company 3 in Hamburg",
"Company 4 Ldt")
)
cities = c('Berlin','Dresden','Hamburg')
Upvotes: 2