daultongray8
daultongray8

Reputation: 93

How do I find the percentage of something in R?

I am really new at R and this is probably a really basic question but let's say I have a data set with 2 columns that has students that are composed of males and female. One column has the student, and the other column is gender. How do I find the percentage of each?

Upvotes: 5

Views: 115265

Answers (4)

xyz123
xyz123

Reputation: 651

This is probably not the most efficient way to do this, but this is one way to solve the problem.

First you have to create a data.frame. How is an artificial one:

students <- data.frame(student = c("Carla", "Josh", "Amanda","Gabriel", "Shannon", "Tiffany"), gender = c("Female", "Male", "Female", "Male", "Female", "Female")

View(students) 

Then I use prop table which gives me a proportion table or the ratios the columns in the matrix, and I coerce it to a data.frame because I love data.frames, and I have to multiply by 100 to turn the ratios from the prop table as they would be as percentages.

tablature <- as.data.frame.matrix(prop.table(table(students)) * 100)
tablature 

I decided to call my data frame table tablature. So it says "Amanda" is 16 + (2 / 3) % on the female column. Basically that means that she is a Female and thus 0 for male, and my data.frame has 6 students so (1 / 6) * 100 makes her 16.667 percent of the set.

Now what percentage of females and males are there? Two ways: 1) Get the number of each set at the same time with the apply function, or get the number of each set one at a time, and we should use the sum function now.

apply(tablature, 2, FUN = sum)

Female Male

66.66667 33.33333

Imagine that in terms of percentages.

Where 2 tablature is the proportion table dataframe that I am applying the sum function to across the columns (2 for columns or 1 for rows).

So if you just eyeball the small amount of data, you can see that there are 2 / 6 = 33.3333% males in the data.frame students, and 4 / 6 = 66.66667 % females in the data.frame so I did the calculation correctly.

Alternatively,

sum(tablature$Female)

[1] 66.66667

sum(tablature$Male)

[1] 33.33333

And you can make a barplot. As I formatted it, you would have to refer to it as a matrix to get a barplot.

And from here you can make a stacked visual comparison of Gender barplot.

barplot(as.matrix(tablature), xlab = "Gender", main = "Barplot comparison of Gender Among Students", ylab = "Percentages of Student Group")

It's stacking because R made each student a box of 16.6667%.

To be honest it looks better if you just plot the the output of the apply function. Of course you could save it to a variable. But naahhh ...

barplot(apply(tablature, 2, FUN = sum), col = c("green", "blue"),xlab = "Gender", ylab = "Percentage of Total Students", main = "Barplot showing the Percentages of Gender Represented Among Students", cex.main = 1)

Now it doesn't stack.

So Here is a visual representation of what I just calculated

Upvotes: 0

PhillC
PhillC

Reputation: 66

There are already some good answers to this question, but as the original submitter admits to being new to R, I wanted to provide a very long form answer. The answer below takes more than the minimum necessary number of steps and doesn't use helpers like pipes.

Hopefully, providing an answer in this way helps the original submitter understand what is happening with each step.

# Load the dplyr library
library("dplyr")

# Create an example data frame
students <-
  data.frame(
    names = c("Bill", "Stacey", "Fred", "Jane", "Sarah"),
    gender = c("M", "F", "M", "F", "F"),
    stringsAsFactors = FALSE
  )

# Count the total number of students.
total_students <- nrow(students)

# Use dplyr filter to obtain just Female students
all_female_students <- dplyr::filter(students, gender %in% "F")

# Count total number of female students
total_female <- nrow(all_female_students)

# Repeat to find total number of male students
all_male_students <- dplyr::filter(students, gender %in% "M")

total_male <- nrow(all_male_students)

# Divide total female students by total students 
# and multiply result by 100 to obtain a percentage
percent_female <- (total_female / total_students) * 100

# Repeat for males
percent_male <- (total_male / total_students) * 100

> percent_female
[1] 60
> percent_male
[1] 40

Upvotes: 1

Lu&#237;s Telles
Lu&#237;s Telles

Reputation: 694

You can use table() function to produce a table telling you how much of males and of females are among the students.Then just divide this table over the total amount of students (you can get this by using the length() function). At last you just multiply the result by 100.

Your code should be something like:

proportions <- table(your_data_frame$gender_columnn)/length(your_data_frame$gender_column)
percentages <- proportions*100

Upvotes: 3

rosscova
rosscova

Reputation: 5600

Another way using data.table:

students <- data.frame( names = c( "Bill", "Stacey", "Fred", "Jane", "Sarah" ), 
                        gender = c( "M", "F", "M", "F", "F" ),
                        stringsAsFactors = FALSE )

library( data.table )
setDT( students )[ , 100 * .N / nrow( students ), by = gender ]

#    gender V1
# 1:      M 40
# 2:      F 60

Or dplyr:

library( dplyr )
students %>% 
    group_by( gender ) %>% 
    summarise( percent = 100 * n() / nrow( students ) )

#  A tibble: 2 × 2
#   gender percent
#    <chr>   <dbl>
# 1      F      60
# 2      M      40

These are both popular packages for operations like these but, as has already been pointed out, you can also stick with base R if you prefer.

Upvotes: 9

Related Questions