How can I draw a similar graph with ggplot2 to find the difference between two works？

Question

I am reading the book Text Mining with R: A Tidy Approach by Julia Silge & David Robinson to try to find the difference between two works, and not the three in the original book, how can I draw a similar graph with ggplot?

In the original book：

    austen <- austen_books() %>% 
  select(-book) %>% 
  mutate(author = "Jane Austen")
bronte <- gutenberg_download(c(1260, 768, 969, 9182, 767)) %>%
  select(-gutenberg_id) %>% 
  mutate(author = "Brontë Sisters")
hgwells <- gutenberg_download(c(35, 36, 5230, 159)) %>% 
  select(-gutenberg_id) %>% 
  mutate(author = "H.G. Wells")

comparison_df <- books %>%
  add_count(author, wt = n, name = "total_word") %>% 
  mutate(proportion = n / total_word) %>% 
  select(-total_word, -n) %>% 
  pivot_wider(names_from = author, values_from = proportion, 
              values_fill = list(proportion = 0)) %>%
  pivot_longer(3:4, names_to = "other", values_to = "proportion")

comparison_df
#> # A tibble: 56,002 x 4
#>   word  `Jane Austen` other          proportion
#>                            
#> 1 miss        0.00855 Brontë Sisters  0.00342  
#> 2 miss        0.00855 H.G. Wells      0.000120 
#> 3 time        0.00615 Brontë Sisters  0.00424  
#> 4 time        0.00615 H.G. Wells      0.00682  
#> 5 fanny       0.00449 Brontë Sisters  0.0000438
#> 6 fanny       0.00449 H.G. Wells      0        
#> # ... with 5.6e+04 more rows

But what if I just want to compare two works?Just like austen and bronte.

comparison_df %>% 
  filter(proportion > 1 / 1e5) %>% 
  ggplot(aes(proportion, `Jane Austen`)) +
  geom_abline(color = "gray40", lty = 2) +
  geom_jitter(aes(color = abs(`Jane Austen` - proportion)),
              alpha = 0.1, size = 2.5, width = 0.3, height = 0.3) +
  geom_text(aes(label = word), check_overlap = TRUE, vjust = 1.5) + 
  scale_x_log10(labels = label_percent()) +
  scale_y_log10(labels = label_percent()) + 
  scale_color_gradient(limits = c(0, 0.001), low = "darkslategray4", high = "gray75") + 
  facet_wrap(~ other) + 
  guides(color = FALSE)

How can I modify the code above here?

How can I draw a similar graph with ggplot2 to find the difference between two works？

Answers (1)

Related Questions