dbeginner
dbeginner

Reputation: 11

Scatterplot in RStudio with ggplot function

I'm trying to see that there is any correlation between the education level and cholesterol awareness using data from the Behavioral Risk Factor Surveillance System for 2013 dataset. The contents of the data can be checked from the link down below: https://d18ky98rnyall9.cloudfront.net/_e34476fda339107329fc316d1f98e042_brfss_codebook.html?Expires=1541203200&Signature=WYq5YJFg5WgVOFV4dWPV~pPtu-31ubNEVxEYlNliJZpqZYXfZ741WN9n~RC~kcF0gE6AdxzzNFbiA7nv5DtQsxeWWs1Y9obwadm2PjV8eO~W0TI0YtyU~vmaWgozEkfbzIB17LP0MFY-dUffEsyb29~~JWYnQXHAZXdm-n5q108_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A##sleptim1

There are two variables what I used for EDA: "educa"(Education Level) and "cholchk"(How Long Since Cholesterol Checked). And this is the code what I created:

> q1 <- select(brfss2013, cholchk, educa) %>%
        filter(!is.na(cholchk), !is.na(educa))

> q1 %>% group_by(cholchk) %>%    summary(count=n())

> ggplot(data = q1, aes(x = educa, y = cholchk)) +
    geom_point(shape=1) +
    geom_smooth(method=1) +
    xlab("educa = Education Level") +
    ylab ("cholchk: How Long Since Cholesterol Checked")

The graph was successfully created. But all dots on the graph are spread at regular intervals(?), so it's unable to check correlation. Could you give me some advice to get a better look than this?

scatterplot image

I don't know how to upload ".RData" file on my question. So this is the best I can do.

cholchk
Within past year :321955
Within past 2 years: 49354
Within past 5 years: 29870
5 or more years ago: 15683

educa
Never attended school or only kindergarten : 463
Grades 1 through 8 (Elementary) : 10189
Grades 9 though 11 (Some high school) : 21173
Grade 12 or GED (High school graduate) :117152
College 1 year to 3 years (Some college or technical school):113993
College 4 years or more (College graduate) :153892

Upvotes: 0

Views: 123

Answers (0)

Related Questions