statsguyz
statsguyz

Reputation: 469

R function to find subset of words based on letter

I'm looking for a way to find a way to create a subset of words from a list of words that contains a specific letter.

Right now I know that I can use the grepexpr function to find whether or not a letter exists in a word, but I'm not able to create a subset of words that contain a specific letter.

I've been able to find the total number of letters within a list of words:

> letters_table2<-table(unlist(strsplit(newdata2, ""), use.names=FALSE))
> letters_table2

 a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  w  x  y  z 
14  9 11  8 11  6  4  7 12  3  3  9 14  7  9  8  6 13 13  6  7  8  4  7  8  3 

I'd like to create a list of words that only contain a, b, c, etc... from newdata2.

    newdata2
  [1] "ae" "aj" "al" "an" "av" "av" "ay" "ba" "bd" "bd" "bk" "bl" "bv" "ca" "cl" "cm" "co"
 [18] "cr" "cy" "dh" "dl" "dm" "ea" "ec" "ef" "er" "ex" "ex" "ez" "fm" "fo" "ft" "gi" "gy"
 [35] "hb" "hm" "hr" "hr" "hs" "id" "in" "io" "iq" "ir" "ir" "it" "iz" "ja" "js" "kn" "lc"
 [52] "ld" "le" "lp" "ls" "me" "mg" "mh" "mi" "mi" "mm" "mo" "ms" "nf" "nw" "ny" "ok" "op"
 [69] "ox" "pa" "pi" "pr" "ps" "ps" "py" "qc" "qf" "qm" "qu" "qy" "rn" "rr" "rs" "rt" "ru"
 [86] "sa" "so" "ss" "ts" "uc" "us" "uu" "ux" "vb" "vc" "vv" "vw" "wb" "wg" "xe" "xo" "xt"
[103] "yd" "yt" "za"

Upvotes: 0

Views: 124

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193667

I would suggest:

setNames(lapply(letters, function(y) grep(y, x, value = TRUE)), letters)

Here's a simple example using just 5 letters instead of all 26.

set.seed(1)
mydata <- paste0(sample(letters[1:5], 15, TRUE), 
                 sample(letters[1:5], 15, TRUE))
table(unlist(strsplit(mydata, ""), use.names = FALSE))
## 
##  a  b  c  d  e 
##  4 11  2  7  6 
setNames(lapply(letters[1:5], function(y) {
  grep(y, mydata, value = TRUE)
}), letters[1:5])
## $a
## [1] "da" "ab" "aa"
## 
## $b
##  [1] "bc" "bd" "eb" "bd" "eb" "ab" "bb" "db" "be" "db"
## 
## $c
## [1] "bc" "ce"
## 
## $d
## [1] "bd" "bd" "dd" "da" "db" "db"
## 
## $e
## [1] "ce" "eb" "ee" "eb" "be"
## 

Upvotes: 1

Related Questions