Lars
Lars

Reputation: 25

What is wrong with my function and for-loop?

I am currently trying to count the absolute number of countries in a long string. I have loaded a data frame named "countries" with column "Countries", consisting of all countries in the world. I want to make a function that searches any string, loop over all the country-names in my df and return the sum of occurrences of any country-name. (I.e. the total number of countries mentioned)

Code:

number.of.countries <- function(str){
  # #Initialize
  countcountry <- 0

  # #loop over all countries:
  for (i in countries$Countries){

  # #Logical test:
    countries_mentioned <- grepl(i, str, perl = T, ignore.case = T)

  # #add to the count
    if (isTRUE(countries_mentioned)){
      countcountry <- countcountry + str_count(str, fixed(countries$Countries[i], ignore_case = TRUE))
    }     
  }                                                                            
  #Output
  return(countcountry)
}



###When running the function:
> number.of.countries(str)
[1] NA

Upvotes: 2

Views: 53

Answers (2)

h3rm4n
h3rm4n

Reputation: 4187

I guess you have multiple strings you want to check for countries, then you could do:

# example data
longstring <- c("The countries austria and Albania are in Europe, while Australia is not. Austria is the richest of the two European countries.",
                "In this second sentence we stress the fact that Australia is part of Australia.")
countries <- c("Austria","Albania","Australia","Azerbeyan")

With lapply and stri_count_fixed from the stringi-package (in which you can specify what to do with case sensitivity) you can get the counts for each country:

library(stringi)
l <- lapply(longstring, stri_count_fixed, pattern = countries, case_insensitive = TRUE)

The result:

[[1]]
[1] 2 1 1 0

[[2]]
[1] 0 0 2 0

Now you can transform that in a dataframe with:

countdf <- setNames(do.call(rbind.data.frame, l), countries)
countdf$total <- rowSums(countdf)

The final result:

> countdf
  Austria Albania Australia Azerbeyan total
1       2       1         1         0     4
2       0       0         2         0     2

NOTE:

To demonstrate the working of case_insensitive = TRUE I started the first appearance of "Austria" in longstring with a lower a.

Upvotes: 0

Michael Bird
Michael Bird

Reputation: 783

You can vectorise your answer to make your code shorter and speed up your function. An example would be:

library(stringr)
number.countries <- function(str,dictionary){
  return(sum(str_count(str,dictionary)))
}
number.countries("England and Ireland, oh and also Wales", c("Wales","Ireland","England"))
[1] 3

which can be passed a custom dictionary (in your case countries$Countries)

Upvotes: 1

Related Questions