Eruthon
Eruthon

Reputation: 21

How to count unique occurrences of data saved in a multi-column table?

I have a table with 3 columns and cca 14.000 rows. I want to count every occurrence of each type of a row.

I am a newbie into R, so can't really come up with a solution to extract it from the table. I managed to list all different values in single column with levels(), but can't really make it work.

Table looks like this:

Table looks like this

My expected result:

IPV4|UDP|UDP: 120 times  
IPV4|UDP|SSDP: 60 times  

...

Upvotes: 0

Views: 65

Answers (1)

Dunois
Dunois

Reputation: 1843

With some sample data that looks like this

tst <- data.frame(Type = c("IPV4", " ", "IPV4", "IPV4"), Protocol = c("UDP", " ", "UDP", "UDP"), Protocol.1 = c("SSDP", " ", "UDP", "UDP"))

You could get tallies as follows using tools from the tidyverse (dplyr, magrittr).

tst_summmary <- tst %>% 
  mutate(class_var = paste(Type, Protocol, Protocol.1, sep = "|")) %>% 
  group_by(class_var) %>% 
  tally() %>% as.data.frame()
# # A tibble: 3 x 2
#   class_var         n
#   <chr>         <int>
# 1 " | | "           1
# 2 IPV4|UDP|SSDP     1
# 3 IPV4|UDP|UDP      2

What we're doing here is concatenating the strings from all the different columns (that you want to use to group/classify) together into the contents of a single column class_var using paste() (mutate() creates this new class_var column). Then we can group the data (group_by) with this newly created column and tally the occurrences with tally().

Getting a table with the original columns along with the generated counts would invoke a for loop and the str_split() function from stringr as shown below.

tst_summary <- tst %>% 
  mutate(class_var = paste(Type, Protocol, Protocol.1, sep = "|")) %>% 
  group_by(class_var) %>% 
  tally() %>% as.data.frame()

for(i in 1:nrow(tst_summary)){
  tst_summary$Type[i] <- lapply(tst_summary$class_var[i], function(x){ unlist(str_split(x, "\\|"))[[1]]})
  tst_summary$Protocol[i] <- lapply(tst_summary$class_var[i], function(x){ unlist(str_split(x, "\\|"))[[2]]})
  tst_summary$Protocol.1[i] <- lapply(tst_summary$class_var[i], function(x){ unlist(str_split(x, "\\|"))[[3]]})
}

tst_summary <- tst_summary[, c(3,4,5,2)]

tst_summary
#   Type Protocol Protocol.1 n
# 1                          1
# 2 IPV4      UDP       SSDP 1
# 3 IPV4      UDP        UDP 2

Upvotes: 1

Related Questions