Reputation: 93
I am really new at R and this is probably a really basic question but let's say I have a data set with 2 columns that has students that are composed of males and female. One column has the student, and the other column is gender. How do I find the percentage of each?
Upvotes: 5
Views: 115265
Reputation: 651
This is probably not the most efficient way to do this, but this is one way to solve the problem.
First you have to create a data.frame. How is an artificial one:
students <- data.frame(student = c("Carla", "Josh", "Amanda","Gabriel", "Shannon", "Tiffany"), gender = c("Female", "Male", "Female", "Male", "Female", "Female")
View(students)
Then I use prop table which gives me a proportion table or the ratios the columns in the matrix, and I coerce it to a data.frame because I love data.frames, and I have to multiply by 100 to turn the ratios from the prop table as they would be as percentages.
tablature <- as.data.frame.matrix(prop.table(table(students)) * 100)
tablature
I decided to call my data frame table tablature. So it says "Amanda" is 16 + (2 / 3) % on the female column. Basically that means that she is a Female and thus 0 for male, and my data.frame has 6 students so (1 / 6) * 100 makes her 16.667 percent of the set.
Now what percentage of females and males are there? Two ways: 1) Get the number of each set at the same time with the apply function, or get the number of each set one at a time, and we should use the sum function now.
apply(tablature, 2, FUN = sum)
Female Male
66.66667 33.33333
Imagine that in terms of percentages.
Where 2 tablature is the proportion table dataframe that I am applying the sum function to across the columns (2 for columns or 1 for rows).
So if you just eyeball the small amount of data, you can see that there are 2 / 6 = 33.3333% males in the data.frame students, and 4 / 6 = 66.66667 % females in the data.frame so I did the calculation correctly.
Alternatively,
sum(tablature$Female)
[1] 66.66667
sum(tablature$Male)
[1] 33.33333
And you can make a barplot. As I formatted it, you would have to refer to it as a matrix to get a barplot.
And from here you can make a stacked visual comparison of Gender barplot.
barplot(as.matrix(tablature), xlab = "Gender", main = "Barplot comparison of Gender Among Students", ylab = "Percentages of Student Group")
It's stacking because R made each student a box of 16.6667%.
To be honest it looks better if you just plot the the output of the apply function. Of course you could save it to a variable. But naahhh ...
barplot(apply(tablature, 2, FUN = sum), col = c("green", "blue"),xlab = "Gender", ylab = "Percentage of Total Students", main = "Barplot showing the Percentages of Gender Represented Among Students", cex.main = 1)
Now it doesn't stack.
Upvotes: 0
Reputation: 66
There are already some good answers to this question, but as the original submitter admits to being new to R, I wanted to provide a very long form answer. The answer below takes more than the minimum necessary number of steps and doesn't use helpers like pipes.
Hopefully, providing an answer in this way helps the original submitter understand what is happening with each step.
# Load the dplyr library
library("dplyr")
# Create an example data frame
students <-
data.frame(
names = c("Bill", "Stacey", "Fred", "Jane", "Sarah"),
gender = c("M", "F", "M", "F", "F"),
stringsAsFactors = FALSE
)
# Count the total number of students.
total_students <- nrow(students)
# Use dplyr filter to obtain just Female students
all_female_students <- dplyr::filter(students, gender %in% "F")
# Count total number of female students
total_female <- nrow(all_female_students)
# Repeat to find total number of male students
all_male_students <- dplyr::filter(students, gender %in% "M")
total_male <- nrow(all_male_students)
# Divide total female students by total students
# and multiply result by 100 to obtain a percentage
percent_female <- (total_female / total_students) * 100
# Repeat for males
percent_male <- (total_male / total_students) * 100
> percent_female
[1] 60
> percent_male
[1] 40
Upvotes: 1
Reputation: 694
You can use table() function to produce a table telling you how much of males and of females are among the students.Then just divide this table over the total amount of students (you can get this by using the length() function). At last you just multiply the result by 100.
Your code should be something like:
proportions <- table(your_data_frame$gender_columnn)/length(your_data_frame$gender_column)
percentages <- proportions*100
Upvotes: 3
Reputation: 5600
Another way using data.table
:
students <- data.frame( names = c( "Bill", "Stacey", "Fred", "Jane", "Sarah" ),
gender = c( "M", "F", "M", "F", "F" ),
stringsAsFactors = FALSE )
library( data.table )
setDT( students )[ , 100 * .N / nrow( students ), by = gender ]
# gender V1
# 1: M 40
# 2: F 60
Or dplyr
:
library( dplyr )
students %>%
group_by( gender ) %>%
summarise( percent = 100 * n() / nrow( students ) )
# A tibble: 2 × 2
# gender percent
# <chr> <dbl>
# 1 F 60
# 2 M 40
These are both popular packages for operations like these but, as has already been pointed out, you can also stick with base R if you prefer.
Upvotes: 9