Reputation: 21
I have a table with 3 columns and cca 14.000 rows. I want to count every occurrence of each type of a row.
I am a newbie into R, so can't really come up with a solution to extract it from the table. I managed to list all different values in single column with levels(), but can't really make it work.
Table looks like this:
My expected result:
IPV4|UDP|UDP: 120 times
IPV4|UDP|SSDP: 60 times
...
Upvotes: 0
Views: 65
Reputation: 1843
With some sample data that looks like this
tst <- data.frame(Type = c("IPV4", " ", "IPV4", "IPV4"), Protocol = c("UDP", " ", "UDP", "UDP"), Protocol.1 = c("SSDP", " ", "UDP", "UDP"))
You could get tallies as follows using tools from the tidyverse
(dplyr
, magrittr
).
tst_summmary <- tst %>%
mutate(class_var = paste(Type, Protocol, Protocol.1, sep = "|")) %>%
group_by(class_var) %>%
tally() %>% as.data.frame()
# # A tibble: 3 x 2
# class_var n
# <chr> <int>
# 1 " | | " 1
# 2 IPV4|UDP|SSDP 1
# 3 IPV4|UDP|UDP 2
What we're doing here is concatenating the strings from all the different columns (that you want to use to group/classify) together into the contents of a single column class_var
using paste()
(mutate()
creates this new class_var
column). Then we can group the data (group_by
) with this newly created column and tally the occurrences with tally()
.
Getting a table with the original columns along with the generated counts would invoke a for loop
and the str_split()
function from stringr
as shown below.
tst_summary <- tst %>%
mutate(class_var = paste(Type, Protocol, Protocol.1, sep = "|")) %>%
group_by(class_var) %>%
tally() %>% as.data.frame()
for(i in 1:nrow(tst_summary)){
tst_summary$Type[i] <- lapply(tst_summary$class_var[i], function(x){ unlist(str_split(x, "\\|"))[[1]]})
tst_summary$Protocol[i] <- lapply(tst_summary$class_var[i], function(x){ unlist(str_split(x, "\\|"))[[2]]})
tst_summary$Protocol.1[i] <- lapply(tst_summary$class_var[i], function(x){ unlist(str_split(x, "\\|"))[[3]]})
}
tst_summary <- tst_summary[, c(3,4,5,2)]
tst_summary
# Type Protocol Protocol.1 n
# 1 1
# 2 IPV4 UDP SSDP 1
# 3 IPV4 UDP UDP 2
Upvotes: 1