Aggregate dataframe into a frequency table

Question

I am looking to reshape a dataframe from something which looks like this, with variables:

Year, University, Degree, Gender

With each row depicting an entry of a student, eg:

2017, University College London, Social Science, Male 

2017, University of Leeds, Social science, Non-Binary

I would like to create a frequency table from this data in order to condense the number of rows so that for each university, there are 19 rows for each of the degree categories, and then for each Degree, the count/frequency for each gender is shown, which would look something like this.

Year University Degree [Gender (Male, Female, Non-Binary)]

2017 UCL Biological Sciences 1 0 2

I hope this makes sense. Thank you for the help.

EDIT: I would now like to be able to plot this data as a line graph using a subset of the data. I am currently subsetting outside of the plotting function like so

   subsetucl <- TFtab[which(TFtab$University == 'University College London'),]
ggplot(data=subsetucl, aes(x=Year, y=Female, group=Degree, color = Degree)) + geom_line()+ geom_point(size = 0.8) + xlab("Year of application") + ylab("Frequnecy of Females") + ggtitle("UCL Applications by Degree (2011-2017)") + theme_bw()

What is the best way to subset the data within the plotting function and how do I best display lines for all Genders rather than just female frequencies. Thank you

svenhalvorson · Accepted Answer

Heres a very solution with dplyr.

library("dplyr")
data %>%
   group_by(University, Degree, Gender) %>%
   count( )%>% 
   spread(key = Gender, value = n, fill = 0)

But seriously use the search function on stack overflow. Here's a book to help with R

Aggregate dataframe into a frequency table

Answers (2)

Related Questions