stochastiq
stochastiq

Reputation: 269

How to split string within each element of a list in R, and keep unique strings in element

I have a list

reqr:
chr [1:3] "interpersonal" "communication" "communication and interpersonal"
chr [1:2] "team player" "initiative"
chr [1:2] "mechanical engineering" "written"

How do I split up strings that contain "and", such that

reqr:
chr [1:3] "interpersonal" "communication" "communication" "and" "interpersonal"
chr [1:2] "team player" "initiative"
chr [1:2] "mechanical engineering" "written"

After which, I ensure every string in each element in unique, such that

reqr:
chr [1:3] "interpersonal" "communication" "and" "interpersonal"
chr [1:2] "team player" "initiative"
chr [1:2] "mechanical engineering" "written"

Upvotes: 0

Views: 2070

Answers (3)

alistaire
alistaire

Reputation: 43334

Hadley's purrr package can make working with lists less annoying:

library(purrr)

         # split each item .x where there's a space with "and" before or after
reqr %>% map(~strsplit(.x, ' (?=and)|(?<=and) ', perl = TRUE)) %>%    # alternate form: `map(strsplit, split = ' (?=and)|(?<=and) ', perl = TRUE)`
    map(compose(unique, unlist))    # equivalent to `map(unlist) %>% map(unique)` or `simplify_all() %>% map(unique)`

# [[1]]
# [1] "interpersonal" "communication" "and"          
# 
# [[2]]
# [1] "team player" "initiative" 
# 
# [[3]]
# [1] "mechanical engineering" "written"  

Data

reqr <- list(c("interpersonal", "communication", "communication and interpersonal"), 
             c("team player", "initiative"), 
             c("mechanical engineering", "written"))

Upvotes: 3

akrun
akrun

Reputation: 887078

We can also do this with scan and gsub

lapply(reqr, function(x) unique(scan(text=gsub(" (and) ", ",\\1,", x), 
                    what = "", sep=",", quiet=TRUE)))
#[[1]]
#[1] "interpersonal" "communication" "and"          

#[[2]]
#[1] "team player" "initiative" 

#[[3]]
#[1] "mechanical engineering" "written"          

NOTE: No external packages used.

Upvotes: 1

akuiper
akuiper

Reputation: 214957

You can try this:

lst <- lapply(l, function(vec) unique(unlist(strsplit(vec, "\\s(?=and)|(?<=and)\\s", perl = T))))

str(lst)
# List of 3
#  $ : chr [1:3] "interpersonal" "communication" "and"
#  $ : chr [1:2] "team player" "initiative"
#  $ : chr [1:2] "mechanical engineering" "written"

Upvotes: 3

Related Questions