Erdne Htábrob
Erdne Htábrob

Reputation: 879

ggplot density with factor variable

I have a dataset in the long format. The data contains policy evaluations from two countries, Poland and Germany. There are five columns that are: cntry (country), wgt_2 (weights), type (the policy being evaluated), value (the score the resp. gave to the policy), labels (the meaning of value as a string).

I would like to plot a weighted density curve score, with countries as two lines, and type as facet. I run into two issues:

  1. I don't know how to integrate weight into the density plot. The weight is included in the dataset (wgt_2)

  2. I would like to have labels instead of value on the horizontal axis, so that the reader immediately knows what the scale of the evaluation was. The problem is though that adding label creates a line where ggplot also tries to balance for the "in-between" values between the factor levels, so the line becomes wiggly. I tried using scale_x_discrete, I also tried the approach suggested here none of which helped. I include a picture of what I mean:

enter image description here

This is the command I used:

    ggplot(plot_dat, aes(x=labels, color=cntry, group=cntry)) +
  geom_density() +
  facet_wrap(~type)

This is a 100 row sample of the dataset to replicate the issue:

structure(list(cntry = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 
1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L), .Label = c("Germany", 
"Poland"), class = "factor"), wgt_2 = structure(c(1.27960370623135, 
1.12172797554474, 1.12172797554474, 0.894262014366493, 1.00972997045152, 
1.13313617678755, 1.32877801805357, 0.759155232925338, 1.13313617678755, 
0.884543585038424, 1.13313617678755, 0.884543585038424, 1.26672089753564, 
1.08715705397184, 1.20856396838766, 1.09821366192373, 0.944801135944303, 
0.84461528487141, 1.08715705397184, 1.13313617678755, 1.00733073227995, 
0.853205193791076, 0.853205193791076, 1.09821366192373, 0.66171219592128, 
1.01923047425237, 1.19639637436972, 0.767496027664015, 1.00733073227995, 
0.835436393423981, 0.791262177881762, 0.535937860607983, 0.903356840604329, 
1.01494775076143, 0.95965888977453, 1.05528409877768, 1.27960370623135, 
1.13313617678755, 0.766875995766742, 0.987425989567564, 1.13313617678755, 
1.19639637436972, 0.948787865326323, 1.12172797554474, 1.34229196026369, 
1.00295405332661, 0.959796632690522, 1.00733073227995, 0.84461528487141, 
1.05528409877768, 0.84461528487141, 1.08715705397184, 1.20856396838766, 
1.09821366192373, 1.12172797554474, 0.893539572876972, 1.01923047425237, 
0.759155232925338, 0.84461528487141, 0.971134847547882, 1.26672089753564, 
1.13313617678755, 0.947612622945283, 0.766875995766742, 0.843932951154142, 
0.84461528487141, 1.00309801053618, 1.01494775076143, 0.655050202375811, 
0.655050202375811, 1.01923047425237, 1.01923047425237, 1.19639637436972, 
1.26672089753564, 1.12172797554474, 0.84461528487141, 0.938072237840432, 
1.34229196026369, 1.13313617678755, 0.955626481232642, 1.09821366192373, 
1.08715705397184, 0.84461528487141, 1.00309801053618, 0.95965888977453, 
0.84461528487141, 1.20856396838766, 1.08715705397184, 0.558604275386284, 
0.853205193791076, 0.775301618081247, 0.938072237840432, 1.00548716730424, 
0.894262014366493, 0.937314403677854, 1.09821366192373, 1.00309801053618, 
1.19639637436972, 1.00548716730424, 1.32877801805357), label = "weight with 2 lvl education", format.stata = "%9.0g"), 
    type = c("Economic meassures", "Health meassures", "Health meassures", 
    "Economic meassures", "Health meassures", "Economic meassures", 
    "Health meassures", "Health meassures", "Economic meassures", 
    "Health meassures", "Economic meassures", "Economic meassures", 
    "Health meassures", "Health meassures", "Economic meassures", 
    "Economic meassures", "Health meassures", "Economic meassures", 
    "Economic meassures", "Health meassures", "Health meassures", 
    "Economic meassures", "Economic meassures", "Economic meassures", 
    "Health meassures", "Economic meassures", "Health meassures", 
    "Health meassures", "Health meassures", "Health meassures", 
    "Economic meassures", "Economic meassures", "Health meassures", 
    "Health meassures", "Health meassures", "Health meassures", 
    "Economic meassures", "Health meassures", "Health meassures", 
    "Economic meassures", "Health meassures", "Health meassures", 
    "Economic meassures", "Economic meassures", "Economic meassures", 
    "Economic meassures", "Economic meassures", "Health meassures", 
    "Economic meassures", "Economic meassures", "Economic meassures", 
    "Health meassures", "Economic meassures", "Economic meassures", 
    "Economic meassures", "Health meassures", "Health meassures", 
    "Economic meassures", "Economic meassures", "Economic meassures", 
    "Health meassures", "Economic meassures", "Economic meassures", 
    "Health meassures", "Health meassures", "Economic meassures", 
    "Economic meassures", "Health meassures", "Health meassures", 
    "Economic meassures", "Health meassures", "Health meassures", 
    "Health meassures", "Health meassures", "Economic meassures", 
    "Health meassures", "Health meassures", "Economic meassures", 
    "Economic meassures", "Health meassures", "Health meassures", 
    "Economic meassures", "Health meassures", "Economic meassures", 
    "Economic meassures", "Economic meassures", "Economic meassures", 
    "Health meassures", "Health meassures", "Economic meassures", 
    "Economic meassures", "Economic meassures", "Health meassures", 
    "Economic meassures", "Health meassures", "Health meassures", 
    "Health meassures", "Health meassures", "Economic meassures", 
    "Health meassures"), value = structure(c(2, 2, 2, 4, 1, 2, 
    3, 4, 1, 3, 2, 3, 4, 5, 1, 3, 3, 3, 3, 3, 4, 1, 3, 1, 3, 
    3, 2, 3, 3, 1, 3, 3, 4, 3, 2, 2, 3, 3, 3, 1, 3, 2, 2, 3, 
    1, 3, 2, 3, 2, 1, 1, 3, 4, 3, 1, 3, 2, 2, 2, 3, 3, 1, 2, 
    5, 1, 3, 1, 3, 5, 2, 1, 4, 1, 2, 2, 3, 2, 3, 3, 1, 3, 2, 
    3, 1, 2, 3, 2, 2, 3, 3, 2, 5, 2, 2, 2, 3, 2, 3, 1, 3), labels = c(`not at all sufficient` = 1, 
    `rather not sufficient` = 2, appropriate = 3, `rather too restrictive` = 4, 
    `extremely restrictive` = 5), label = "measures to overcome health risks due to corona", class = c("haven_labelled", 
    "vctrs_vctr", "double")), labels = structure(c(2L, 2L, 2L, 
    4L, 1L, 2L, 3L, 4L, 1L, 3L, 2L, 3L, 4L, 5L, 1L, 3L, 3L, 3L, 
    3L, 3L, 4L, 1L, 3L, 1L, 3L, 3L, 2L, 3L, 3L, 1L, 3L, 3L, 4L, 
    3L, 2L, 2L, 3L, 3L, 3L, 1L, 3L, 2L, 2L, 3L, 1L, 3L, 2L, 3L, 
    2L, 1L, 1L, 3L, 4L, 3L, 1L, 3L, 2L, 2L, 2L, 3L, 3L, 1L, 2L, 
    5L, 1L, 3L, 1L, 3L, 5L, 2L, 1L, 4L, 1L, 2L, 2L, 3L, 2L, 3L, 
    3L, 1L, 3L, 2L, 3L, 1L, 2L, 3L, 2L, 2L, 3L, 3L, 2L, 5L, 2L, 
    2L, 2L, 3L, 2L, 3L, 1L, 3L), .Label = c("not at all sufficient", 
    "rather not sufficient", "appropriate", "rather too restrictive", 
    "extremely restrictive"), class = "factor")), row.names = c(NA, 
-100L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 1

Views: 645

Answers (2)

TarJae
TarJae

Reputation: 78907

I am also not sure. Is quite different.

ggplot(df, aes(x=labels, weight = wgt_2, color=cntry, group=cntry)) +
  geom_density() +
  facet_wrap(~type)  

enter image description here

Upvotes: 0

Jon Spring
Jon Spring

Reputation: 66415

One way to incorporate weight into your density plot would be to use uncount to make more copies of each observation in proportion to its weight. And you can adjust the wiggliness of your lines by adjusting the smoothing bandwidth with bw or adjust. Here I've set adjust to 1.5 to make it use a wider bandwidth and be smoother.

library(tidyverse)

plot_dat %>%
  mutate(labels_wrap = str_wrap(labels, width = 12)) %>% 
  uncount(wgt_2*100) %>%
ggplot(aes(x=labels_wrap, color=cntry, group=cntry)) +
  geom_density(adjust = 1.5) +
  facet_wrap(~type) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

enter image description here

Upvotes: 1

Related Questions