Reputation: 13
I'm just getting started in R and I'm trying to wrap my head around barplot for a university assignment. Specifically, I am using the General Social Survey 2018 dataset (for codebook: https://www.thearda.com/Archive/Files/Codebooks/GSS2018_CB.asp) and I am trying to figure out if religion has any effect on the way people seek out help for mental health. I want to use reliten
(self-assessment of religiousness - from strong to no religion) as the IV and tlkclrgy
, (asks if a person with mental health issues should reach out to a religious leader - yes or no) as the DV. For a better visualization of the data, I want to create a side-by-side barplot with reliten
on the x-axis and see how many people answered yes and no on tlkclrgy
. My problem is that on the barplot I get numbers instead of categories (from strong to no religion). This is what I tried, but I keep getting NA on the x-axis:
GSS$reliten <- factor(as.character(GSS$reliten),
levels = c("No religion", "Somewhat
strong", "Not very strong",
"Strong"))
GSS <- GSS18[!GSS18$tlkclrgy %in% c(0, 8, 9),]
GSS$reliten <- as_factor(GSS$reliten)
GSS$tlkclrgy <- as_factor(GSS$tlkclrgy)
ggplot(data=GSS,mapping=aes(x=reliten,fill=tlkclrgy))+
geom_bar(position="dodge")
Does anybody have any tips?
Upvotes: 1
Views: 56
Reputation: 76402
Here is complete code to download the codebook and data, table the two columns of interest and plot the frequencies.
Data will be downloaded to a temporary directory, to keep my disk palatable. Use of these first two instructions is optional
od <- getwd()
setwd("~/Temp")
These are the links to the two files that need to be read and the filenames.
cols_url <- "https://osf.io/ydxu4/download"
cols_file <- "General Social Survey, 2018.col"
data_url <- "https://osf.io/e76rv/download"
data_file <- "General Social Survey, 2018.dat"
download.file(cols_url, cols_file, mode = "wb")
download.file(data_url, data_file, mode = "wb")
Now read in the codebook and process it, extracting the column widths and column names.
cols <- readLines(cols_file)
cols <- strsplit(cols, ": ")
widths_char <- sapply(cols, '[', 2)
i_widths <- grepl("-", widths_char)
f <- function(x) -eval(parse(text = x)) + 1L
widths <- rep(1L, length(widths_char))
widths[i_widths] <- f(widths[i_widths])
col_names <- sapply(cols, '[', 1)
col_names <- trimws(sub("^.[^ ]* ", "", col_names))
col_names <- tolower(col_names)
Finally, read the fixed width text file.
df1 <- read.fwf(data_file, widths = widths, header = FALSE, na.strings = "-", col.names = col_names)
Find out where are the two columns we want with grep
.
i_cols <- c(
grep("reliten", col_names, ignore.case = TRUE),
grep("tlkclrgy", col_names, ignore.case = TRUE)
)
head(df1[i_cols])
Table those columns and coerce to data.frame. Then coerce the columns to factor.
Here there is a problem, there is no answer 3 for tlkclrgy
in the published survey but there are answers 3 in the data file. So I have created an extra factor level.
GSS <- as.data.frame(table(df1[i_cols]))
labels_reliten <- c(
"Not applicable",
"Strong",
"Not very strong",
"Somewhat Strong",
"No religion",
"Don't know",
"No answer"
)
levels_reliten <- c(0, 1, 2, 3, 4, 8, 9)
labels_tlkclrgy <- c(
"Not applicable",
"Yes",
"No",
"Not in codebook",
"Don't know",
"No answer"
)
levels_tlkclrgy <- c(0, 1, 2, 3, 8, 9)
GSS$reliten <- factor(
GSS$reliten,
labels = labels_reliten,
levels = levels_reliten
)
GSS$tlkclrgy <- factor(
GSS$tlkclrgy,
labels = labels_tlkclrgy,
levels = levels_tlkclrgy
)
library(ggplot2)
ggplot(data = GSS, mapping = aes(x = reliten, y = Freq, fill = tlkclrgy)) +
geom_col(position = "dodge")
Upvotes: 1