monteromati
monteromati

Reputation: 67

Split string and concatenate removing whole word in R

I am trying to remove the words "Arts and Humanities" and "Social Sciences" from a string containing concatenated by "/" different disciplines of knowledge as follows:

string = "Arts and Humanities Other Topics/Social Sciences Other Topics/Arts and Humanities/Social Sciences/Sociology"

I have tried this using stringr package:

sapply(strsplit(string, "/"), function(x) paste(str_remove(x, "\\bArts and Humanities\\b|\\bSocial Sciences\\b"), collapse = "/"))

But the output generated is " Other Topics/ Other Topics///Sociology" and I need an output like this:

"Arts and Humanities Other Topics/Social Sciences Other Topics/Sociology"

Thanks in advance.

Upvotes: 0

Views: 413

Answers (2)

Greg
Greg

Reputation: 3326

Just needs a little tweaking, and now strings can be generalized to a vector of such strings:

Solution

sapply(
  # Split each string by "/" into its components.
  X = strsplit(x = strings, split = "/"),
  # Remove undesired components and then reassemble the strings.
  FUN = function(v){paste0(
    # Use subscripting to filter out matches.
    v[!grepl(x = v, pattern = "^\\s*(Arts and Humanities|Social Sciences)\\s*$")],
    # Reassemble components as separated by "/".
    collapse = "/"
  )},
  
  # Make the result a vector like the original 'string' (rather than a list).
  simplify = TRUE,
  USE.NAMES = FALSE
)

Result

Given a vector of strings like this

strings <- c(
  "Arts and Humanities Other Topics/Social Sciences Other Topics/Arts and Humanities/Social Sciences/Sociology",
  "Sociology/Arts and Humanities"
)

this solution should yield the following result:

[1] "Arts and Humanities Other Topics/Social Sciences Other Topics/Sociology"
[2] "Sociology"

Note

A solution that uses unlist() will collapse everything into a single, giant string, rather than reassembling each string in strings.

Upvotes: 1

AlexB
AlexB

Reputation: 3269

One way would be separate the whole string and then exclude that part that you are not interested in:

paste0(unlist(strsplit(string, '/'))[!unlist(strsplit(string, '/')) %in% c("Arts and Humanities", "Social Sciences")],
      collapse = '/')

or

paste0(base::setdiff(unlist(strsplit(string, '/')),
        c("Arts and Humanities", "Social Sciences")), collapse = '/')

#"Arts and Humanities Other Topics/Social Science Other Topics/Sociology"

Upvotes: 1

Related Questions