lamentaciones
lamentaciones

Reputation: 31

Tidyverse and R: how to count rows in a tibble of a nested dataframe

So, I've checked multiple posts and haven't found anything. According to this, my code should work, but it isn't.

Objective: I want to essentially print out the number of subjects--which in this case is also the number of rows in this tibble.

Code:

 data<-read.csv("advanced_r_programming/data/MIE.csv")

make_LD<-function(x){
  LongitudinalData<-x%>%
    group_by(id)%>%
    nest()
  structure(list(LongitudinalData), class = "LongitudinalData")
}

print.LongitudinalData<-function(x){
  paste("Longitudinal dataset with", x[["id"]], "subjects")

}

x<-make_LD(data)

print(x)

Here's the head of the dataset I'm working on:

> head(x)
[[1]]
# A tibble: 10 x 2
      id                  data
   <int>                <list>
 1    14 <tibble [11,945 x 4]>
 2    20 <tibble [11,497 x 4]>
 3    41 <tibble [11,636 x 4]>
 4    44 <tibble [13,104 x 4]>
 5    46 <tibble [13,812 x 4]>
 6    54 <tibble [10,944 x 4]>
 7    64 <tibble [11,367 x 4]>
 8    74 <tibble [11,517 x 4]>
 9   104 <tibble [11,232 x 4]>
10   106 <tibble [13,823 x 4]>

Output:

[1] "Longitudinal dataset with  subjects"

I've tried every possible combination from the aforementioned stackoverflow post and none seem to work.

Upvotes: 3

Views: 12378

Answers (2)

Mario Reutter
Mario Reutter

Reputation: 359

There is a specific function for this in the tidyverse: n()

You can simply do: mtcars %>% group_by(cyl) %>% summarise(rows = n())

> mtcars %>% group_by(cyl) %>% summarise(rows = n())
# A tibble: 3 x 2
    cyl  rows
  <dbl> <int>
1     4    11
2     6     7
3     8    14

In more sophisticated cases, in which subjects may span across multiple rows ("long format data"), you can do (assuming hp denotes the subject):

> mtcars %>% group_by(cyl, hp) %>% #always group by subject-ID last
+   summarise(n = n()) %>% #observations per subject and cyl
+   summarise(n = n()) #subjects per cyl (implicitly summarises across all group-variables except the last)
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
# A tibble: 3 x 2
    cyl     n
  <dbl> <int>
1     4    10
2     6     4
3     8     9

Note that the n in the last case is smaller than in the first because there are cars with same amount of cyl and hp that are now counted as just one "subject".

Upvotes: 0

eipi10
eipi10

Reputation: 93891

Here are two options:

library(tidyverse)

# Create a nested data frame
dat = mtcars %>% 
  group_by(cyl) %>% 
  nest %>% as.tibble
    cyl               data
1     6  <tibble [7 x 10]>
2     4 <tibble [11 x 10]>
3     8 <tibble [14 x 10]>
dat %>% 
  mutate(nrow=map_dbl(data, nrow))

dat %>% 
  group_by(cyl) %>% 
  mutate(nrow = nrow(data.frame(data)))
    cyl               data  nrow
1     6  <tibble [7 x 10]>     7
2     4 <tibble [11 x 10]>    11
3     8 <tibble [14 x 10]>    14

Upvotes: 5

Related Questions