Reputation: 95
Say I have the following dataframe consisting of two vectors containing character strings:
df <- data.frame(
"ID"= c("1a", "1b", "1c", "1d"),
"Codes" = c("BX.MX|GX.WX", "MX.RX|BX.YX", "MX.OX|GX.GX", "MX.OX|YX.OX"),
stringsAsFactors = FALSE)
I'd like a simple way to determine which characters have been used in a given vector. In other words, the output of such a function would reveal:
find.characters(df$Codes) # hypothetical function
[1] "B" "G" "M" "W" "X" "R" "Y" "O" "|" "."
find.characters(df$ID) # hypothetical function
[1] "1" "a" "b" "c" "d"
Upvotes: 1
Views: 560
Reputation: 887038
You can create a custom function to do this. The idea is to split the strings into individual characters (strsplit(v1, '')
), output will be list
. We can unlist
it to make it a vector
, then get the unique
elements. But, this is not sorted yet. Based on the example showed, you may want to sort
the letters and other characters differently. So, we use grep
to index the 'LETTER' character, and use this to separately sort
the subset of vectors and concatenate c(
it together.
find.characters <- function(v1){
x1 <- unique(unlist(strsplit(v1, '')))
indx <- grepl('[A-Z]', x1)
c(sort(x1[indx]), sort(x1[!indx]))
}
find.characters(df$Codes)
#[1] "B" "G" "M" "O" "R" "W" "X" "Y" "|" "."
find.characters(df$ID)
#[1] "1" "a" "b" "c" "d"
NOTE: Generally, I would use grepl('[A-Za-z]', x1)
, but I didn't do that because the expected result for the 'ID' column is different.
Upvotes: 3
Reputation: 113
find.characters<-function(x){
unique(c(strsplit(split="",x),recursive = T))
}
Upvotes: 1