Cryena
Cryena

Reputation: 33

Comparing value for different frequencies using graphs in r

My data looks like this:

        ID       Date                                          Reduction     Collected Provided         Freq Gender
1   AAA016000 2018-04-10                                           0              0              7        1   <NA>
2   AAA059717 2017-03-21                                           1              0             45       10 Female
3   AAA059717 2017-04-22                                           0              0             10       10 Female
4   AAA059717 2017-05-09                                           0              0             10        2 Female
5   AAA059717 2017-06-09                                           1              0             40        6 Female
6   AAA059717 2018-07-03                                          NA            180            200       35 Female
7   AAA059717 2018-09-26                                          NA             10             30       15 Female
8   AAA059717 2018-09-26                                           1             NA             NA       NA Female
9   AAA059717 2018-10-12                                          NA              0             20        3 Female
10  AAA059717 2018-11-07                                          NA             30             50       20 Female
11  AAA059717 2018-11-07                                           0             NA             NA       NA Female
12  AAA059717 2018-11-08                                          NA              2             20       10 Female

'data.frame':   190122 obs. of  7 variables:
 $ ID                                         : chr  "AAA016000" "AAA059717" "AAA059717" "AAA059717" ...
 $ Date                                       : Date, format: "2018-04-10" "2017-03-21" "2017-04-22" "2017-05-09" ...
 $ Reduction                                  : num  0 1 0 0 1 NA NA 1 NA NA ...
 $ Collected                                  : num  0 0 0 0 0 180 10 NA 0 30 ...
 $ Provided                                   : num  7 45 10 10 40 200 30 NA 20 50 ...
 $ Freq                                       : num  1 10 10 2 6 35 15 NA 3 20 ...
 $ Gender                                     : chr  NA "Female" "Female" "Female" ...

And when i try to find out if higher freq also has higher Provided, i did this:

ggplot(data = df, aes(x = Freq, y = Provided)) + 
  geom_point()+
  geom_line()

But the graph doesn't look right?? Graph

How do i make a better graph to visualize if higher freq has higher provided than lower freq? and lastly, How do I visualize whether a freq of 10 or over is Provided more often than freq under 10? Thank you for your response, I apreciate it.

Upvotes: 1

Views: 64

Answers (1)

danlooo
danlooo

Reputation: 10637

There is a strong significant linear correlation between Freq and Provided (Pearson, effect size R = 0.89, p < 0.001).

Frequencies above or equal to 10 have not significantly higher provided values (Wilcoxon rank sum test, p = 0.16). Keep in mind that this discretization of the Freq variable into two binary categories (high and low) is often arbitrary and significance can be highly depended on the threshold (here 10).

library(tidyverse)
library(ggpubr)

df <- tribble(
  ~row_id, ~ID, ~Date, ~Reduction, ~Collected, ~Provided, ~Freq, ~Gender,
  1, "AAA016000", " 2018-04-10", 0, 0, 7, 1, NA,
  2, "AAA059717", " 2017-03-21", 1, 0, 45, 10, "Female",
  3, "AAA059717", "2017-04-22", 0, 0, 10, 10, "Female",
  4, "AAA059717", "2017-05-09", 0, 0, 10, 2, "Female",
  5, "AAA059717", "2017-06-09", 1, 0, 40, 6, "Female",
  6, "AAA059717", "2018-07-03", NA, 180, 200, 35, "Female",
  7, "AAA059717", "2018-09-26", NA, 10, 30, 15, "Female",
  8, "AAA059717", "2018-09-26", 1, NA, NA, NA, "Female",
  9, "AAA059717", "2018-10-12", NA, 0, 20, 3, "Female",
  10, "AAA059717", "2018-11-07", NA, 30, 50, 20, "Female",
  11, "AAA059717", "2018-11-07", 0, NA, NA, NA, "Female",
  12, "AAA059717", "2018-11-08", NA, 2, 20, 10, "Female"
)

df %>%
  ggplot(aes(Freq, Provided)) +
  geom_point() +
  stat_smooth(method = "lm") +
  stat_cor(method = "pearson")
#> `geom_smooth()` using formula 'y ~ x'
#> Warning: Removed 2 rows containing non-finite values (stat_smooth).
#> Warning: Removed 2 rows containing non-finite values (stat_cor).
#> Warning: Removed 2 rows containing missing values (geom_point).

df %>%
  mutate(high_Freq = Freq >= 10) %>%
  filter(!is.na(high_Freq)) %>%
  ggplot(aes(high_Freq, Provided)) +
  geom_boxplot() +
  stat_compare_means()

Created on 2021-11-10 by the reprex package (v2.0.1)

Upvotes: 1

Related Questions