Marian Minar
Marian Minar

Reputation: 1456

dplyr::summarize alphabetizes, but I need the original order

library(tidyverse)

I have a string of letters and letter-words:

myletters <- c("A", "A", "B", "C", "C", "C", "C", "AA", "BB", "BB")

I'd like to get a count of each letter, arranged in the original order. All groups of similar letters and letter words will be grouped in the original string ... they will never be mixed. For example, this will never happen:

mylettersNever <- c("A", "B", "A", "C", "C", "C", "C", "AA", "BB", "BB")

I tried some things with table(), but it did the same as the following code. This doesn't work:

myletters %>%
  tibble(letters = .) %>%
  group_by(letters) %>%
  summarise(n = n())

... because the output is

# A tibble: 5 x 2
  letters     n
  <chr>   <int>
1 A           2
2 AA          1
3 B           1
4 BB          2
5 C           4

... but I would like:

# A tibble: 5 x 2
  letters     n
  <chr>   <int>
1 A           2
2 B           1
3 C           4
4 AA          1
5 BB          2

Help?

Upvotes: 2

Views: 2106

Answers (2)

Shree
Shree

Reputation: 11140

Here's a hacky way but works. Basically assign an id column to each group based on whichever appears first and then drop id after summarizing. Also you can directly use count which groups and un-groups behind the scenes.

myletters %>%
  tibble(letters = .) %>%
  count(id = match(letters, unique(letters)), letters) %>%
  select(-id)

# A tibble: 5 x 2
  letters     n
  <chr>   <int>
1 A           2
2 B           1
3 C           4
4 AA          1
5 BB          2

Upvotes: 1

cderv
cderv

Reputation: 6542

You can use count() to do the counting according to some variable. Indeed, to keep the order, considering your character column as factor will help maintain levels in order

library(tidyverse)
myletters <- c("A", "A", "B", "C", "C", "C", "C", "AA", "BB", "BB")

tibble(letters = myletters) %>%
  mutate(letters = as_factor(letters)) %>%
  count(letters)
#> # A tibble: 5 x 2
#>   letters     n
#>   <fct>   <int>
#> 1 A           2
#> 2 B           1
#> 3 C           4
#> 4 AA          1
#> 5 BB          2

Created on 2018-12-05 by the reprex package (v0.2.1)

Upvotes: 3

Related Questions