user2386786
user2386786

Reputation: 735

count unique set of columns in R

I have a dataframe that looks like this:

   FieldID X2009 X2010 X2011 X2012 X2013 X2014
1     H003     1     1     1     1     1     1
2     H001    NA     1     1     1     1     1
3     H005    NA     1     1     1     1     1
4     H006    NA     1     1     1     1     1
5     H009    NA     1     1     1    NA     1
6     H010    NA     1     1     1    NA     1
7     H002    NA     1     1     1    NA    NA
8     H007    NA     1     1     1    NA    NA
9     H008    NA     1     1    NA     1    NA
10    H004    NA     1     1    NA    NA     1

I try to count the number of rows falling into every unqique combination of X2009-X2014. So the outcome for the dataframe looks like:

   FieldID X2009 X2010 X2011 X2012 X2013 X2014 row
1     H003     1     1     1     1     1     1 1
2     H001    NA     1     1     1     1     1 3
5     H009    NA     1     1     1    NA     1 2
7     H002    NA     1     1     1    NA    NA 2
9     H008    NA     1     1    NA     1    NA 1
10    H004    NA     1     1    NA    NA     1 1

I tried doing the following:

  tt%>%
  gather(.,Year,value,X2009:X2014)%>%
  mutate(value=ifelse(is.na(value),0,1))%>%
  tidyr::spread(.,Year,value)%>%
  group_by(X2009,X2010,X2011,X2012,X2013,X2014)
  %>%summarise(row=n())

This gives me the error

> Error in n() : This function should not be called directly

Replacing n() with length() or NROW() did not help. How can I achieve this?

Upvotes: 2

Views: 92

Answers (1)

talat
talat

Reputation: 70336

Here's an option:

grps <- names(DF)[-1]          # get the grouping columns

DF %>% 
  group_by_(.dots = grps) %>%
  mutate(row = n()) %>%
  distinct()                   # you could add %>% ungroup() if required

#Source: local data frame [6 x 8]
#Groups: X2009, X2010, X2011, X2012, X2013, X2014
#
#  FieldID X2009 X2010 X2011 X2012 X2013 X2014 row
#1    H003     1     1     1     1     1     1   1
#2    H001    NA     1     1     1     1     1   3
#3    H009    NA     1     1     1    NA     1   2
#4    H002    NA     1     1     1    NA    NA   2
#5    H008    NA     1     1    NA     1    NA   1
#6    H004    NA     1     1    NA    NA     1   1

Edit:

Or without intermediate variable:

DF %>% 
    group_by_(.dots = names(.)[-1]) %>%
    mutate(row = n()) %>%
    distinct()

Upvotes: 1

Related Questions