TMWP
TMWP

Reputation: 1625

Getting count of groupings in R - throws error - not a vector?

I have a data.frame with a head that looks like this:

> head(movies_by_yr)

Source: local data frame [6 x 4]
Groups: YR_Released [6]

               Movie_Title YR_Released Rating Num_Reviews
                <fctr>      <fctr>  <dbl>       <int>
1 The Shawshank Redemption        1994    9.2     1773755
2            The Godfather        1972    9.2     1211083
3   The Godfather: Part II        1974    9.0      832342
4          The Dark Knight        2008    8.9     1755341
5             12 Angry Men        1957    8.9      477276
6         Schindler's List        1993    8.9      909358

Note that when created, I specified stringsAsFactors=FALSE, so I believe the columns that got converted to factors were converted when I grouped the data frame in preparation for the next step:

movies_by_yr <- group_by(problem1_data, YR_Released)

Now we come to the problem. The goal is to group by YR_Released so we can get counts of records by year. I thought the next step would be something like this, but it throws an error and I am not sure what i am doing wrong:

summarise(movies_by_yr, total = nrow(YR_Released))

I choose nrow because once you have a grouping, the number of rows within that grouping should be the count. Can someone point me to what I am doing wrong?

The error thrown is:

Error in summarise_impl(.data, dots) : Not a vector

But I know this data.frame was created from a series of vectors and whatever is different from the sample code from class and my attempt, I am just not seeing it. Hoping someone can answer this ...

Upvotes: 1

Views: 1079

Answers (2)

Sam Firke
Sam Firke

Reputation: 23014

Let's use data that everyone has, like the built-in mtcars data.frame, to make this more useful for future readers.

If you look at the documentation ?nrow you'll see that function is meant to be called on a data.frame or matrix. You are calling it on a column, YR_Released. There is a vector-specific variant of the function nrow, called (confusingly) NROW - if you try that instead, it may work.

But even if it does, the intended dplyr way to count rows is with n(), like this:

mycars <- mtcars
mycars <- group_by(mycars, cyl)
summarise(mycars, total = NROW(cyl))
#> # A tibble: 3 x 2
#>     cyl total
#>   <dbl> <int>
#> 1     4    11
#> 2     6     7
#> 3     8    14

And because it's such a common use case, the wrapper function count() will save you some code:

mtcars %>%
  count(cyl)

Upvotes: 1

Nicol&#225;s Vila
Nicol&#225;s Vila

Reputation: 1

Try this (I think it's what you want)

table(movies_by_year$YR_Released)

Upvotes: 0

Related Questions