Luke
Luke

Reputation: 95

Determine all characters present in a vector of strings

Say I have the following dataframe consisting of two vectors containing character strings:

df <- data.frame(
      "ID"= c("1a", "1b", "1c", "1d"), 
      "Codes" = c("BX.MX|GX.WX", "MX.RX|BX.YX", "MX.OX|GX.GX", "MX.OX|YX.OX"),
      stringsAsFactors = FALSE)

I'd like a simple way to determine which characters have been used in a given vector. In other words, the output of such a function would reveal:

find.characters(df$Codes) # hypothetical function
[1] "B" "G" "M" "W" "X" "R" "Y" "O" "|" "."

find.characters(df$ID) # hypothetical function
[1] "1" "a" "b" "c" "d"

Upvotes: 1

Views: 560

Answers (2)

akrun
akrun

Reputation: 887038

You can create a custom function to do this. The idea is to split the strings into individual characters (strsplit(v1, '')), output will be list. We can unlist it to make it a vector, then get the unique elements. But, this is not sorted yet. Based on the example showed, you may want to sort the letters and other characters differently. So, we use grep to index the 'LETTER' character, and use this to separately sort the subset of vectors and concatenate c( it together.

 find.characters <- function(v1){
  x1 <- unique(unlist(strsplit(v1, '')))
  indx <- grepl('[A-Z]', x1)
  c(sort(x1[indx]), sort(x1[!indx]))
 }

 find.characters(df$Codes)
 #[1] "B" "G" "M" "O" "R" "W" "X" "Y" "|" "."

 find.characters(df$ID)
 #[1] "1" "a" "b" "c" "d"

NOTE: Generally, I would use grepl('[A-Za-z]', x1), but I didn't do that because the expected result for the 'ID' column is different.

Upvotes: 3

xiao
xiao

Reputation: 113

find.characters<-function(x){
  unique(c(strsplit(split="",x),recursive = T))
}

Upvotes: 1

Related Questions