Daniel
Daniel

Reputation: 481

How to aggregate categorical data in R?

I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns. The dataframe I am using is as follows:

       Category.x  Category.y
1      Better      Better
2      Better      Better
3      Similar     Similar
4      Worse       Similar

I would like to come up with a table like this:

           Category.x    Category.y
Better     2             2
Similar    1             2
Worse      1             0

How would you go about it?

Upvotes: 8

Views: 5300

Answers (3)

tmfmnk
tmfmnk

Reputation: 40171

One dplyr and tidyr possibility could be:

df %>%
 gather(var, val) %>%
 count(var, val) %>%
 spread(var, n, fill = 0)

  val     Category.x Category.y
  <chr>        <dbl>      <dbl>
1 Better           2          2
2 Similar          1          2
3 Worse            1          0

It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.

Or with dplyr and reshape2 you can do:

df %>%
 mutate(rowid = row_number()) %>%
 melt(., id.vars = "rowid") %>%
 count(variable, value) %>%
 dcast(value ~ variable, value.var = "n", fill = 0)

    value Category.x Category.y
1  Better          2          2
2 Similar          1          2
3   Worse          1          0

Upvotes: 2

Frank
Frank

Reputation: 66819

As mentioned in the comments, table is standard for this, like

table(stack(DT))

         ind
values    Category.x Category.y
  Better           2          2
  Similar          1          2
  Worse            1          0

or

table(value = unlist(DT), cat = names(DT)[col(DT)])

         cat
value     Category.x Category.y
  Better           2          2
  Similar          1          2
  Worse            1          0

or

with(reshape(DT, direction = "long", varying = 1:2), 
  table(value = Category, cat = time)
)

         cat
value     x y
  Better  2 2
  Similar 1 2
  Worse   1 0

Upvotes: 8

d.b
d.b

Reputation: 32558

sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
#        Category.x Category.y
#Better           2          2
#Similar          1          2
#Worse            1          0

Upvotes: 3

Related Questions