R Count values in DF variable by groups

Question

Using following dataset:

set.seed(2)
origin <- rep(c("DEU", "GBR", "ITA", "NLD", "CAN", "MEX", "USA", "CHN", "JPN", "KOR","DEU", "GBR", "ITA", "NLD", "CAN", "MEX", "USA", "CHN", "JPN", "KOR"), 4)
year <- rep(c(rep(1998, 10), rep(2000, 10)), 2)
type <- sample(1:10, size=length(origin), replace=TRUE)
value <- sample(100:10000, size=length(origin), replace=TRUE)
test.df <- as.data.frame(cbind(origin, year, type, value))
rm(origin, year, type, value)
### add some (6) missing values
test.df$value[sample(1:length(test.df$value), 6, replace = FALSE)] <- NA

I want to count how many types by country (origin) per year

I tryed:

count(trade.df, origin, year)

and

test.df %>% group_by(origin, year) %>% count()

but I am not sure of how I can interpret these results.

of course, if value == NA, R should not count it...

iod · Accepted Answer

To remove the rows where value is NA, use filter:

test.df %>% group_by(origin,year) %>% 
  filter(!is.na(value)) %>% count()

# A tibble: 20 x 3
# Groups:   origin, year [20]
   origin year      n
      
 1 CAN    1998      4
 2 CAN    2000      3
 3 CHN    1998      3
 4 CHN    2000      4
 5 DEU    1998      4
 6 DEU    2000      4
 7 GBR    1998      4
 8 GBR    2000      4
 9 ITA    1998      3
10 ITA    2000      4
11 JPN    1998      3
12 JPN    2000      3
13 KOR    1998      4
14 KOR    2000      4
15 MEX    1998      4
16 MEX    2000      4
17 NLD    1998      3
18 NLD    2000      4
19 USA    1998      4
20 USA    2000      4

Note, however, that this doesn't count how many types there are in each group, but how many rows there are. If you want to count the number of unique types, you can do this:

test.df %>% group_by(origin,year) %>% 
  filter(!is.na(value)) %>% 
  summarize(n_distinct(type)) #Merci, @Frank!

# A tibble: 20 x 3
# Groups:   origin [?]
   origin year  `length(unique(type))`
                       
 1 CAN    1998                       3
 2 CAN    2000                       3
 3 CHN    1998                       2
 4 CHN    2000                       3
 5 DEU    1998                       4
 6 DEU    2000                       3
 7 GBR    1998                       4
 8 GBR    2000                       4
 9 ITA    1998                       3
10 ITA    2000                       4
11 JPN    1998                       3
12 JPN    2000                       2
13 KOR    1998                       4
14 KOR    2000                       4
15 MEX    1998                       3
16 MEX    2000                       3
17 NLD    1998                       2
18 NLD    2000                       3
19 USA    1998                       3
20 USA    2000                       4

R Count values in DF variable by groups

Answers (1)

Related Questions