Reputation: 1773
I am wondering on how to get the unique number of characters from the text string from a structured dataset. This is a follow up question on my previous post. I would like to get a unique count of apples (coded as App), bananas (coded as Ban), pineapples (coded as Pin), grapes (coded as Grp)
text<- c('AppPinAppBan', 'AppPinOra', 'AppPinGrpLonNYC')
df<- data.frame(text)
library(stringr)
df$fruituniquecount<- str_count(df$A, "App|Ban|Pin|Grp")
## I am expecting output as follows:
text fruituniquecount
AppPinAppBan 3
AppPinOra 2
AppPinGrpLonNYC 3
Upvotes: 1
Views: 55
Reputation: 51602
Following the same idea as the accepted answer at your previous question, then you can do,
library(stringr)
sapply(str_extract_all(df$text, "App|Ban|Pin|Grp"), function(i)length(unique(i)))
#[1]3 2 3
Upvotes: 2
Reputation: 76673
Maybe this can be done with base R, no need for external packages.
m <- gregexpr("App|Ban|Pin|Grp", df$text)
df$fruituniquecount <- lengths(lapply(regmatches(df$text, m), unique))
df
# text fruituniquecount
#1 AppPinAppBan 3
#2 AppPinOra 2
#3 AppPinGrpLonNYC 3
Upvotes: 3