Reputation: 463
I am trying to extract the first letter of a string that are separated by commas, then counting how many times that letter appears. So an example of a column in my data frame looks like this:
test <- data.frame("Code" = c("EKST, STFO", "EFGG", "SSGG, RRRR, RRFK",
"RRRF"))
And I'd want a column added next to it that looks like this:
test2 <- data.frame("Code" = c("EKST, STFO", "EFGG", "SSGG, RRRR, RRFK",
"RRRF"), "Code_Count" = c("E1, S1", "E1", "S1, R2", "R1"))
The code count column extracts the first letter of the string and counts how many times that letter appears in that specific cell.
I looked into using strsplit to get the first letter in the column separated by commas, but I'm not sure how to attach the count of how many times that letter appears in the cell to it.
Upvotes: 2
Views: 675
Reputation: 5138
Here is one option using base R. This splits the Code
column on the comma (and at least one space), then tabulates the number of times the first letter appears, then pastes them back together into your desired output. It does sort the new column in alphabetical order (which doesn't match your output). Hope this helps!
test2$Coode_Count2 <- sapply(strsplit(test2$Code, ",\\s+"), function(x) {
tab <- table(substr(x, 1, 1)) # Create a table of the first letters
paste0(names(tab), tab, collapse = ", ") # Paste together the letter w/ the number and collapse them
} )
test2
Code Code_Count Coode_Count2
1 EKST, STFO E1, S1 E1, S1
2 EFGG E1 E1
3 SSGG, RRRR, RRFK S1, R2 R2, S1
4 RRRF R1 R1
Here is a tidier, stringr
/purrr
solution that grabs the first letter of a word and does the same thing (instead of splitting the string)
library(purrr)
library(stringr)
map_chr(str_extract_all(test2$Code, "\\b[A-Z]{1}"), function(x) {
tab <- table(x)
paste0(names(tab), tab, collapse = ", ")
} )
Data:
test2 <- data.frame("Code" = c("EKST, STFO", "EFGG", "SSGG, RRRR, RRFK",
"RRRF"), "Code_Count" = c("E1, S1", "E1", "S1, R2", "R1"))
test2[] <- lapply(test2, as.character) # factor to character
Upvotes: 4