Reputation: 181
this question is a followup from https://stackoverflow.com/a/64991805?noredirect=1 i have a dataset with one dictionary, dict, and the word frequencies of each word within dictionary from dataset, dictfreq$freq_gov, dictfreq$freq_indiv, dictfreq$freq_media.
dict: apple, pear, pineapple
freq_gov: 12, 13, 10
freq_indiv: 11, 20, 1
freq_media: 13, 21, 9
desired output looks like this: https://blog.revolutionanalytics.com/2015/12/r-is-the-fastest-growing-language-on-stackoverflow.html where y-axis has:
- rank going from 1-3
- list of the words from dict (apple, pear, pineapple), and
and x-axis has:
- categories of freq_gov, freq_indiv, freq_media
basically, i want to visualize comparison of the frequency of each word in dict across gov, indiv, and media.
this is the code template i have been trying to revise so far:
p <- ggplot(mapping = aes(dictfreq, y = rank, group = tag, color = tag)) +
geom_line(size = 1.7, alpha = 0.25, data = dictfreq) +
geom_line(size = 2.5, data = dictfreq %>% filter(tag %in% names(colors)[colors != "gray"])) +
geom_point(size = 4, alpha = 0.25, data = dictfreq) +
geom_point(size = 4, data = dftags4 %>% filter(tag %in% names(colors)[colors != "gray"])) +
geom_point(size = 1.75, color = "white", data = dictfreq) +
geom_text(data = dftags5, aes(label = tag), hjust = -0, size = 4.5) +
geom_text(data = dftags6, aes(label = tag), hjust = 1, size = 4.5) +
scale_color_manual(values = colors) +
ggtitle("The subway-style-rank-year-tag plot:\nPast and the Future") +
xlab("Top Tags by Year in Stackoverflow") +
scale_x_continuous(breaks = seq(min(dftags4$creationyear) - 2,
max(dftags4$creationyear) + 2),
limits = c(min(dftags4$creationyear) - 1.0,
max(dftags4$creationyear) + 0.5))
p
but i am having trouble molding it to my data. specifically, my x axis will be three categorical sections (media, gov, indiv) that is not a separate variable in my data. what should i do??
--
edit: including the actual data here - dput() as suggested:
structure(list(word = c("apple", "apple", "apple",
"mandarin", "mandarin", "mandarin", "orange", "orange", "orange", "pear"),
name = c("freq_ongov", "freq_onindiv", "freq_onmedia", "freq_ongov",
"freq_onindiv", "freq_onmedia", "freq_ongov", "freq_onindiv",
"freq_onmedia", "freq_ongov"), value = c(0, 87, 63, 0, 44,
20, 3, 27, 25, 0), rank = c(26, 85, 70, 26, 61, 42.5, 86,
47, 48, 26)), row.names = c(NA, -10L), groups = structure(list(
name = c("freq_ongov", "freq_onindiv", "freq_onmedia"), .rows = structure(list(
c(1L, 4L, 7L, 10L), c(2L, 5L, 8L), c(3L, 6L, 9L)), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 3L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
should be noted that the actual data has 160 unique dict words!
--
update: i went according to Allan's suggestion, and the pivotlonger() function worked, but am stuck on an error when i try to generate the actual ggplot. this is my code:
ggplot(mergedicts, aes(name, rank, color = word, group = word)) +
geom_line(size = 200) +
geom_point(shape = 21, fill = "white", size =200) +
scale_y_continuous(breaks = seq(max(mergedicts$rank)), labels = leftlabels,
sec.axis = sec_axis(~., breaks = seq(max(mergedicts$rank)),
labels = rightlabels)) +
scale_x_discrete(expand = c(0.01, 0)) +
guides(color = guide_none()) +
coord_cartesian(clip = "off") +
theme(axis.ticks.length.y = unit(0, "points"))
which gives the error:
Error: `breaks` and `labels` must have the same length Run `rlang::last_error()` to see where the error occurred.
6.
stop(fallback)
5.
signal_abort(cnd)
4.
abort("`breaks` and `labels` must have the same length")
3.
check_breaks_labels(breaks, labels)
2.
continuous_scale(c("y", "ymin", "ymax", "yend", "yintercept", "ymin_final", "ymax_final", "lower", "middle", "upper", "y0"), "position_c", identity, name = name, breaks = breaks, n.breaks = n.breaks, minor_breaks = minor_breaks, labels = labels, limits = limits, ...
1.
scale_y_continuous(breaks = seq(max(mergedicts$rank)), labels = leftlabels, sec.axis = sec_axis(~., breaks = seq(max(mergedicts$rank)), labels = rightlabels))
any suggestions??
Upvotes: 0
Views: 249
Reputation: 174468
It's hard to follow your example, because your data is not presented in a standard way. I think you mean you have a data frame with four columns like this:
dictfreq <- data.frame(dict = c("apple", "pear", "pineapple"),
freq_gov = c(12, 13, 10),
freq_indiv = c(11, 20, 1),
freq_media = c(13, 21, 9))
dictfreq
#> dict freq_gov freq_indiv freq_media
#> 1 apple 12 11 13
#> 2 pear 13 20 21
#> 3 pineapple 10 1 9
Now if that is the case, your first task will be to pivot this data into long format, and get the rank for each of the three categorical variables:
library(ggplot2)
library(dplyr)
library(tidyr)
df <- pivot_longer(dictfreq, -1) %>% group_by(name) %>% mutate(rank = rank(value))
df
#> # A tibble: 9 x 4
#> # Groups: name [3]
#> dict name value rank
#> <fct> <chr> <dbl> <dbl>
#> 1 apple freq_gov 12 2
#> 2 apple freq_indiv 11 2
#> 3 apple freq_media 13 2
#> 4 pear freq_gov 13 3
#> 5 pear freq_indiv 20 3
#> 6 pear freq_media 21 3
#> 7 pineapple freq_gov 10 1
#> 8 pineapple freq_indiv 1 1
#> 9 pineapple freq_media 9 1
Note that for your example, the rank doesn't change for the three dictionary items across your categories: pear
is always the highest, followed by apple
, followed by pineapple
. This doesn't make for a very interesting plot, but let's roll with it for now. You will need to define the labels for the left hand and right hand axis according to which fruit should appear there. You can do that like this:
leftlabels <- df$dict[df$name == "freq_gov"]
leftlabels <- leftlabels[order(df$rank[df$name == "freq_gov"])]
rightlabels <- df$dict[df$name == "freq_media"]
rightlabels <- rightlabels[order(df$rank[df$name == "freq_media"])]
Now you are ready to plot. You will need to include a secondary axis:
ggplot(df, aes(name, rank, color = dict, group = dict)) +
geom_line(size = 4) +
geom_point(shape = 21, fill = "white", size = 4) +
scale_y_continuous(breaks = seq(max(df$rank)), labels = leftlabels,
sec.axis = sec_axis(~., breaks = seq(max(df$rank)),
labels = rightlabels)) +
scale_x_discrete(expand = c(0.01, 0)) +
guides(color = guide_none()) +
coord_cartesian(clip = "off") +
theme(axis.ticks.length.y = unit(0, "points"))
Like I say, this is not a very interesting plot because that's just what the data show. However, if we try it with slightly more interesting data:
dictfreq <- data.frame(dict = c("apple", "pear", "pineapple", "banana", "kiwi"),
freq_gov = c(10, 13, 9, 14, 11),
freq_indiv = c(11, 22, 1, 6, 16),
freq_media = c(13, 21, 9, 10, 8))
Now we run exactly the same code and we can see this is much closer to the kind of thing you are looking for:
Upvotes: 1