Elisabeth B.
Elisabeth B.

Reputation: 13

How do I create barplots with categories instead of numbers?

I'm just getting started in R and I'm trying to wrap my head around barplot for a university assignment. Specifically, I am using the General Social Survey 2018 dataset (for codebook: https://www.thearda.com/Archive/Files/Codebooks/GSS2018_CB.asp) and I am trying to figure out if religion has any effect on the way people seek out help for mental health. I want to use reliten (self-assessment of religiousness - from strong to no religion) as the IV and tlkclrgy, (asks if a person with mental health issues should reach out to a religious leader - yes or no) as the DV. For a better visualization of the data, I want to create a side-by-side barplot with reliten on the x-axis and see how many people answered yes and no on tlkclrgy. My problem is that on the barplot I get numbers instead of categories (from strong to no religion). This is what I tried, but I keep getting NA on the x-axis:

GSS$reliten <- factor(as.character(GSS$reliten), 
                      levels = c("No religion", "Somewhat 
                                 strong", "Not very strong", 
                                 "Strong"))
GSS <- GSS18[!GSS18$tlkclrgy %in% c(0, 8, 9),] 
GSS$reliten <- as_factor(GSS$reliten)
GSS$tlkclrgy <- as_factor(GSS$tlkclrgy)
ggplot(data=GSS,mapping=aes(x=reliten,fill=tlkclrgy))+
  geom_bar(position="dodge")

Does anybody have any tips?

Upvotes: 1

Views: 56

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76402

Here is complete code to download the codebook and data, table the two columns of interest and plot the frequencies.

1. Read the data

Data will be downloaded to a temporary directory, to keep my disk palatable. Use of these first two instructions is optional

od <- getwd()
setwd("~/Temp")

These are the links to the two files that need to be read and the filenames.

cols_url <- "https://osf.io/ydxu4/download"
cols_file <- "General Social Survey, 2018.col"
data_url <- "https://osf.io/e76rv/download"
data_file <- "General Social Survey, 2018.dat"

download.file(cols_url, cols_file, mode = "wb")
download.file(data_url, data_file, mode = "wb")

Now read in the codebook and process it, extracting the column widths and column names.

cols <- readLines(cols_file)
cols <- strsplit(cols, ": ")
widths_char <- sapply(cols, '[', 2)
i_widths <- grepl("-", widths_char)

f <- function(x) -eval(parse(text = x)) + 1L
widths <- rep(1L, length(widths_char))
widths[i_widths] <- f(widths[i_widths])

col_names <- sapply(cols, '[', 1)
col_names <- trimws(sub("^.[^ ]* ", "", col_names))
col_names <- tolower(col_names)

Finally, read the fixed width text file.

df1 <- read.fwf(data_file, widths = widths, header = FALSE, na.strings = "-", col.names = col_names)

2. Table the data

Find out where are the two columns we want with grep.

i_cols <- c(
  grep("reliten", col_names, ignore.case = TRUE),
  grep("tlkclrgy", col_names, ignore.case = TRUE)
)
 
head(df1[i_cols])

Table those columns and coerce to data.frame. Then coerce the columns to factor.

Here there is a problem, there is no answer 3 for tlkclrgy in the published survey but there are answers 3 in the data file. So I have created an extra factor level.

GSS <- as.data.frame(table(df1[i_cols]))

labels_reliten <- c(
  "Not applicable", 
  "Strong", 
  "Not very strong", 
  "Somewhat Strong",
  "No religion",
  "Don't know",
  "No answer"
)
levels_reliten <- c(0, 1, 2, 3, 4, 8, 9)
labels_tlkclrgy <- c(
  "Not applicable", 
  "Yes",
  "No",
  "Not in codebook",
  "Don't know",
  "No answer"
)
levels_tlkclrgy <- c(0, 1, 2, 3, 8, 9)

GSS$reliten <- factor(
  GSS$reliten, 
  labels = labels_reliten,
  levels = levels_reliten
)
GSS$tlkclrgy <- factor(
  GSS$tlkclrgy,
  labels = labels_tlkclrgy,
  levels = levels_tlkclrgy
)

3. Plot the frequencies table

library(ggplot2)

ggplot(data = GSS, mapping = aes(x = reliten, y = Freq, fill = tlkclrgy)) +
  geom_col(position = "dodge")

enter image description here

Upvotes: 1

Related Questions