Danny
Danny

Reputation: 574

Creating bins in tables

I have a set of data that shows test scores (raw$Score) by year (raw$Year).

I would like to create a table where I see the years as rows and scores in columns and for the scores to be grouped in increments of 5.

When I runtable(raw$Year,raw$Score), I get the following. How would I modify table(raw$Year,raw$Score) so that the counts will be grouped by 75-80, 81-85, 86-90, 91,-95, and 96-100?

   75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
  2006  0  1  0  0  1  3  1  0  1  1  1  1  0  0  1  0  0  0  2  1  0  2  0  1  1   6
  2007  0  0  1  0  1  1  0  0  0  0  1  1  0  0  0  0  0  1  0  0  0  4  2  2  1   5
  2008  0  1  2  0  2  2  1  2  1  0  0  0  0  0  0  1  2  0  0  0  1  0  2  3  2   8
  2009  1  1  3  0  1  1  6  0  2  1  1  3  1  0  0  2  0  1  0  0  0  1  2  2  1  13
  2010  2  5  0  2  3  1  1  3  2  1  1  0  2  1  2  2  1  0  0  2  2  2  2  2  4  24
  2011  2  1  1  0  3  1  0  1  2  0  0  1  2  1  1  2  3  0  2  1  2  0  0  2  3  12
  2012  1  1  3  5  4  1  1  1  1  0  2  0  1  1  0  1  0  0  1  2  1  0  3  1  2  24
  2013  2  2  3  3  1  2  2  3  1  0  4  3  0  2  1  0  1  1  2  0  2  0  2  2  2  20
  2014  0  4  3  1  1  3  1  4  2  0  4  2  1  1  2  3  0  1  1  1  1  2  3  1  4  20
  2015  2  1  1  3  4  2  2  2  0  0  1  3  4  2  3  1  2  3  1  0  2  0  0  2  2  22
  2016  2  2  2  3  3  2  0  2  4  3  5  2  2  4  3  4  2  5  2  1  1  2  2  3  2  20
  2017  2  7  5  3  3  3  4  6  0  2  1  2  1  3  2  2  0  7  1  3  3  2  3  1  5  17

Upvotes: 0

Views: 74

Answers (2)

Greg B
Greg B

Reputation: 96

I hope the below helps you get on the right track!

I first created a sample dataframe that I hope matches the one you are currently using! Check out How to make a great R reproducible example? for some guidance on how to write really good questions in the future!


library(dplyr)

# create an example data frame
set.seed(123)
raw <- data.frame(Year = rep(2006:2017,10), 
                  Score= rep(rnorm(12, mean = 80, sd = 10), 10))
head(raw)
#>   Year    Score
#> 1 2006 74.39524
#> 2 2007 77.69823
#> 3 2008 95.58708
#> 4 2009 80.70508
#> 5 2010 81.29288
#> 6 2011 97.15065

# create a new "group" column and assign each row into a group based on score
raw <- raw %>%
  mutate(group = if_else(Score < 75, "<75",
                         if_else(Score >= 75 & Score < 80, "75-80",
                         if_else(Score >= 80 & Score < 85, "80-85",
                         if_else(Score >= 85 & Score < 90, "85-90",
                         if_else(Score >= 90 & Score < 95, "90-95",
                         if_else(Score >= 95 & Score <= 100, "95-100", "error")))))))

head(raw)
#>   Year    Score  group
#> 1 2006 74.39524    <75
#> 2 2007 77.69823  75-80
#> 3 2008 95.58708 95-100
#> 4 2009 80.70508  80-85
#> 5 2010 81.29288  80-85
#> 6 2011 97.15065 95-100

# summarise counts by year
raw %>%
  group_by(group) %>%
  summarise(n = n())
#> # A tibble: 5 x 2
#>    group     n
#>    <chr> <int>
#> 1    <75    30
#> 2  75-80    20
#> 3  80-85    40
#> 4  90-95    10
#> 5 95-100    20

Upvotes: 1

Ben Bolker
Ben Bolker

Reputation: 226182

use cut:

with(raw,table(Year,
   cut(Score,breaks=seq(75,100,by=5),
       right=TRUE,include.lowest=TRUE)))

(I think: you may need to fuss with right and include.lowest a bit)

PS with() isn't really necessary, it just lets us not repeat raw$ twice in the code ...

Upvotes: 3

Related Questions