Reputation: 2859
I have a data-set as below. I would like to group by then count the number of strings. Many thanks in advance.
SO = c("Journal Of Business", "Journal Of Business", "Journal of Economy")
AU_UN = c("Dartmouth Coll;Wellesley Coll;Wellesley Coll",
"Georgetown Univ;Fed Reserve Syst",
"Georgetown Univ;Fed Reserve Syst")
df <- data.frame(SO, AU_UN);df
Expected Answer
Journal Of Business Dartmouth Coll (1);Wellesley Coll (2); Georgetown Univ (1);Fed Reserve Syst (1)
Journal of Economy Georgetown Univ (1); Fed Reserve Syst (1)
Upvotes: 2
Views: 85
Reputation: 269654
Use separate_rows to convert to long form, count the rows and convert back with summarize.
library(dplyr)
library(tidyr)
df %>%
separate_rows(AU_UN, sep = ";") %>%
count(SO, AU_UN) %>%
group_by(SO) %>%
summarize(AU_UN = paste(sprintf("%s (%d)", AU_UN, n), collapse=";"), .groups = "drop")
giving:
# A tibble: 2 x 2
SO AU_UN
<chr> <chr>
1 Journal Of Business Dartmouth Coll (1);Fed Reserve Syst (1);Georgetown Univ (1);Wellesley Coll (2)
2 Journal of Economy Fed Reserve Syst (1);Georgetown Univ (1)
Upvotes: 1
Reputation: 6628
Using base::strsplit()
we can extract the "sub strings". strsplit()
returns a list
that contains a vector
of the strings for each row. The new list-column
or nested column
can be unnested with tidyr::unnest()
. To get the frequencies of each string for each journal, we use dplyr::count()
.
library(tidyverse)
df %>%
mutate(strings = strsplit(AU_UN, ";")) %>%
unnest(strings) %>%
count(SO, strings)
#> # A tibble: 6 x 3
#> SO strings n
#> <chr> <chr> <int>
#> 1 Journal Of Business Dartmouth Coll 1
#> 2 Journal Of Business Fed Reserve Syst 1
#> 3 Journal Of Business Georgetown Univ 1
#> 4 Journal Of Business Wellesley Coll 2
#> 5 Journal of Economy Fed Reserve Syst 1
#> 6 Journal of Economy Georgetown Univ 1
Upvotes: 2