Reputation: 879
I have a dataset in the long format. The data contains policy evaluations from two countries, Poland and Germany. There are five columns that are: cntry (country), wgt_2 (weights), type (the policy being evaluated), value (the score the resp. gave to the policy), labels (the meaning of value as a string).
I would like to plot a weighted density curve score, with countries as two lines, and type as facet. I run into two issues:
I don't know how to integrate weight into the density plot. The weight is included in the dataset (wgt_2)
I would like to have labels instead of value on the horizontal axis, so that the reader immediately knows what the scale of the evaluation was. The problem is though that adding label creates a line where ggplot also tries to balance for the "in-between" values between the factor levels, so the line becomes wiggly. I tried using scale_x_discrete, I also tried the approach suggested here none of which helped. I include a picture of what I mean:
This is the command I used:
ggplot(plot_dat, aes(x=labels, color=cntry, group=cntry)) +
geom_density() +
facet_wrap(~type)
This is a 100 row sample of the dataset to replicate the issue:
structure(list(cntry = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L), .Label = c("Germany",
"Poland"), class = "factor"), wgt_2 = structure(c(1.27960370623135,
1.12172797554474, 1.12172797554474, 0.894262014366493, 1.00972997045152,
1.13313617678755, 1.32877801805357, 0.759155232925338, 1.13313617678755,
0.884543585038424, 1.13313617678755, 0.884543585038424, 1.26672089753564,
1.08715705397184, 1.20856396838766, 1.09821366192373, 0.944801135944303,
0.84461528487141, 1.08715705397184, 1.13313617678755, 1.00733073227995,
0.853205193791076, 0.853205193791076, 1.09821366192373, 0.66171219592128,
1.01923047425237, 1.19639637436972, 0.767496027664015, 1.00733073227995,
0.835436393423981, 0.791262177881762, 0.535937860607983, 0.903356840604329,
1.01494775076143, 0.95965888977453, 1.05528409877768, 1.27960370623135,
1.13313617678755, 0.766875995766742, 0.987425989567564, 1.13313617678755,
1.19639637436972, 0.948787865326323, 1.12172797554474, 1.34229196026369,
1.00295405332661, 0.959796632690522, 1.00733073227995, 0.84461528487141,
1.05528409877768, 0.84461528487141, 1.08715705397184, 1.20856396838766,
1.09821366192373, 1.12172797554474, 0.893539572876972, 1.01923047425237,
0.759155232925338, 0.84461528487141, 0.971134847547882, 1.26672089753564,
1.13313617678755, 0.947612622945283, 0.766875995766742, 0.843932951154142,
0.84461528487141, 1.00309801053618, 1.01494775076143, 0.655050202375811,
0.655050202375811, 1.01923047425237, 1.01923047425237, 1.19639637436972,
1.26672089753564, 1.12172797554474, 0.84461528487141, 0.938072237840432,
1.34229196026369, 1.13313617678755, 0.955626481232642, 1.09821366192373,
1.08715705397184, 0.84461528487141, 1.00309801053618, 0.95965888977453,
0.84461528487141, 1.20856396838766, 1.08715705397184, 0.558604275386284,
0.853205193791076, 0.775301618081247, 0.938072237840432, 1.00548716730424,
0.894262014366493, 0.937314403677854, 1.09821366192373, 1.00309801053618,
1.19639637436972, 1.00548716730424, 1.32877801805357), label = "weight with 2 lvl education", format.stata = "%9.0g"),
type = c("Economic meassures", "Health meassures", "Health meassures",
"Economic meassures", "Health meassures", "Economic meassures",
"Health meassures", "Health meassures", "Economic meassures",
"Health meassures", "Economic meassures", "Economic meassures",
"Health meassures", "Health meassures", "Economic meassures",
"Economic meassures", "Health meassures", "Economic meassures",
"Economic meassures", "Health meassures", "Health meassures",
"Economic meassures", "Economic meassures", "Economic meassures",
"Health meassures", "Economic meassures", "Health meassures",
"Health meassures", "Health meassures", "Health meassures",
"Economic meassures", "Economic meassures", "Health meassures",
"Health meassures", "Health meassures", "Health meassures",
"Economic meassures", "Health meassures", "Health meassures",
"Economic meassures", "Health meassures", "Health meassures",
"Economic meassures", "Economic meassures", "Economic meassures",
"Economic meassures", "Economic meassures", "Health meassures",
"Economic meassures", "Economic meassures", "Economic meassures",
"Health meassures", "Economic meassures", "Economic meassures",
"Economic meassures", "Health meassures", "Health meassures",
"Economic meassures", "Economic meassures", "Economic meassures",
"Health meassures", "Economic meassures", "Economic meassures",
"Health meassures", "Health meassures", "Economic meassures",
"Economic meassures", "Health meassures", "Health meassures",
"Economic meassures", "Health meassures", "Health meassures",
"Health meassures", "Health meassures", "Economic meassures",
"Health meassures", "Health meassures", "Economic meassures",
"Economic meassures", "Health meassures", "Health meassures",
"Economic meassures", "Health meassures", "Economic meassures",
"Economic meassures", "Economic meassures", "Economic meassures",
"Health meassures", "Health meassures", "Economic meassures",
"Economic meassures", "Economic meassures", "Health meassures",
"Economic meassures", "Health meassures", "Health meassures",
"Health meassures", "Health meassures", "Economic meassures",
"Health meassures"), value = structure(c(2, 2, 2, 4, 1, 2,
3, 4, 1, 3, 2, 3, 4, 5, 1, 3, 3, 3, 3, 3, 4, 1, 3, 1, 3,
3, 2, 3, 3, 1, 3, 3, 4, 3, 2, 2, 3, 3, 3, 1, 3, 2, 2, 3,
1, 3, 2, 3, 2, 1, 1, 3, 4, 3, 1, 3, 2, 2, 2, 3, 3, 1, 2,
5, 1, 3, 1, 3, 5, 2, 1, 4, 1, 2, 2, 3, 2, 3, 3, 1, 3, 2,
3, 1, 2, 3, 2, 2, 3, 3, 2, 5, 2, 2, 2, 3, 2, 3, 1, 3), labels = c(`not at all sufficient` = 1,
`rather not sufficient` = 2, appropriate = 3, `rather too restrictive` = 4,
`extremely restrictive` = 5), label = "measures to overcome health risks due to corona", class = c("haven_labelled",
"vctrs_vctr", "double")), labels = structure(c(2L, 2L, 2L,
4L, 1L, 2L, 3L, 4L, 1L, 3L, 2L, 3L, 4L, 5L, 1L, 3L, 3L, 3L,
3L, 3L, 4L, 1L, 3L, 1L, 3L, 3L, 2L, 3L, 3L, 1L, 3L, 3L, 4L,
3L, 2L, 2L, 3L, 3L, 3L, 1L, 3L, 2L, 2L, 3L, 1L, 3L, 2L, 3L,
2L, 1L, 1L, 3L, 4L, 3L, 1L, 3L, 2L, 2L, 2L, 3L, 3L, 1L, 2L,
5L, 1L, 3L, 1L, 3L, 5L, 2L, 1L, 4L, 1L, 2L, 2L, 3L, 2L, 3L,
3L, 1L, 3L, 2L, 3L, 1L, 2L, 3L, 2L, 2L, 3L, 3L, 2L, 5L, 2L,
2L, 2L, 3L, 2L, 3L, 1L, 3L), .Label = c("not at all sufficient",
"rather not sufficient", "appropriate", "rather too restrictive",
"extremely restrictive"), class = "factor")), row.names = c(NA,
-100L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 1
Views: 645
Reputation: 78907
I am also not sure. Is quite different.
ggplot(df, aes(x=labels, weight = wgt_2, color=cntry, group=cntry)) +
geom_density() +
facet_wrap(~type)
Upvotes: 0
Reputation: 66415
One way to incorporate weight into your density plot would be to use uncount
to make more copies of each observation in proportion to its weight. And you can adjust the wiggliness of your lines by adjusting the smoothing bandwidth with bw
or adjust
. Here I've set adjust to 1.5 to make it use a wider bandwidth and be smoother.
library(tidyverse)
plot_dat %>%
mutate(labels_wrap = str_wrap(labels, width = 12)) %>%
uncount(wgt_2*100) %>%
ggplot(aes(x=labels_wrap, color=cntry, group=cntry)) +
geom_density(adjust = 1.5) +
facet_wrap(~type) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
Upvotes: 1