user3570187
user3570187

Reputation: 1773

Getting unique count from a structured text data

I am wondering on how to get the unique number of characters from the text string from a structured dataset. This is a follow up question on my previous post. I would like to get a unique count of apples (coded as App), bananas (coded as Ban), pineapples (coded as Pin), grapes (coded as Grp)

    text<- c('AppPinAppBan', 'AppPinOra', 'AppPinGrpLonNYC')
    df<- data.frame(text)

   library(stringr)
   df$fruituniquecount<- str_count(df$A, "App|Ban|Pin|Grp")

   ## I am expecting output as follows:

      text           fruituniquecount
     AppPinAppBan     3
     AppPinOra        2
     AppPinGrpLonNYC  3

Upvotes: 1

Views: 55

Answers (2)

Sotos
Sotos

Reputation: 51602

Following the same idea as the accepted answer at your previous question, then you can do,

library(stringr)

sapply(str_extract_all(df$text, "App|Ban|Pin|Grp"), function(i)length(unique(i)))
#[1]3 2 3

Upvotes: 2

Rui Barradas
Rui Barradas

Reputation: 76673

Maybe this can be done with base R, no need for external packages.

m <- gregexpr("App|Ban|Pin|Grp", df$text)
df$fruituniquecount <- lengths(lapply(regmatches(df$text, m), unique))

df
#             text fruituniquecount
#1    AppPinAppBan                3
#2       AppPinOra                2
#3 AppPinGrpLonNYC                3

Upvotes: 3

Related Questions