eastafri
eastafri

Reputation: 2226

R plot frequency of strings with specific pattern

Given a data frame with a column that contains strings. I would like to plot the frequency of strings that bear a certain pattern. For example

strings  <- c("abcd","defd","hfjfjcd","kgjgcdjrye","yryriiir","twtettecd")
df <- as.data.frame(strings)
df
     strings
1       abcd
2       defd
3    hfjfjcd
4 kgjgcdjrye
5   yryriiir
6  twtettec

I would like to plot the frequency of the strings that contain the pattern `"cd" Anyone with a quick solution?

Upvotes: 0

Views: 3588

Answers (3)

Areza
Areza

Reputation: 6080

check "Kernlab" package. You can define a kernel (pattern) which could any kind of string and count them later on.

Upvotes: 1

IRTFM
IRTFM

Reputation: 263342

Others have already mentioned grepl. Here is an implementation with plot.density using grep to get the positions of the matchesenter image description here

plot( density(0+grepl("cd", strings)) )

If you don't like the extension of the density plot beyond the range there are other methods in the 'logspline' package that allow one to get sharp border at range extremes. Searching RSiteSearch

Upvotes: 1

Andrie
Andrie

Reputation: 179428

I presume from your question that you meant to have some entries that appear more than once, so I've added one duplicate string:

x <- c("abcd","abcd","defd","hfjfjcd","kgjgcdjrye","yryriiir","twtettecd")

To find only those strings that contain a specific pattern, use grep or grepl:

y <- x[grepl("cd", x)]

To get a table of frequencies, you can use table

table(y)

y
      abcd    hfjfjcd kgjgcdjrye  twtettecd 
         2          1          1          1 

And you can plot it using plot or barplot as follows:

barplot(table(y))

enter image description here

Upvotes: 2

Related Questions