Full-join after goruping dataframe in R using dplyr

Question

I have a dataframe that looks like this:

+--------+---------------+----+
| period |     label     | n  |
+--------+---------------+----+
|      4 | Engaged       |  2 |
|      4 | Remarkable    |  1 |
|      5 | Engaged       |  1 |
|      5 | Inconsistent  |  2 |
|      5 | Remarkable    |  5 |
|      6 | Engaged       |  1 |
|      6 | Inconsistent  |  1 |
|      6 | Remarkable    |  5 |
|      7 | Engaged       |  2 |
|      7 | Remarkable    |  3 |
|      7 | Transactional |  2 |
+--------+---------------+----+

And I need to make it such that every option for label (Inconsistent, Transactional, Engaged, Remarkable) is present for every period. And if every label is not used in each period, then it should be inserted for that period with the value of n equal to 0.

I thought about pivoting the dataframe from long-to-wide, and then filling in missing values with 0, but sometimes every value might not be see in any of the periods. I also thought about grouping the dataframe by period and then doing a full join to all the labels, but it seems that groups are ignored when joining a dataframe.

I need to end up with a dataframe that looks like this:

+--------+---------------+----+
| period |     label     | n  |
+--------+---------------+----+
|      4 | Inconsistent  |  0 |
|      4 | Transactional |  0 |
|      4 | Engaged       |  2 |
|      4 | Remarkable    |  1 |
|      5 | Inconsistent  |  2 |
|      5 | Transactional |  0 |
|      5 | Engaged       |  1 |
|      5 | Remarkable    |  5 |
|      6 | Inconsistent  |  1 |
|      6 | Transactional |  0 |
|      6 | Engaged       |  1 |
|      6 | Remarkable    |  5 |
|      7 | Inconsistent  |  0 |
|      7 | Transactional |  2 |
|      7 | Engaged       |  2 |
|      7 | Remarkable    |  3 |
+--------+---------------+----+

Here is the example data that I am working with:

df <- as.data.frame(
  list(
    period = c(4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L,
                  7L, 7L, 7L),
    label = c(
      "Engaged",
      "Remarkable",
      "Engaged",
      "Inconsistent",
      "Remarkable",
      "Engaged",
      "Inconsistent",
      "Remarkable",
      "Engaged",
      "Remarkable",
      "Transactional"
    ),
    n = c(2L, 1L, 1L,
          2L, 5L, 1L, 1L, 5L, 2L, 3L, 1L)
  )
)

options <- as.data.frame(
  list(
    label = c(
      "Inconsistent",
      "Transactional",
      "Engaged",
      "Remarkable"
    ),
    n = c(0L, 0L, 0L, 0L)
  )
)

akrun · Accepted Answer

We can group by 'period', and then complete the 'label' based on the values of 'label' from 'options' dataset

library(dplyr)
library(tidyr)
df %>% 
     group_by(period) %>%
     complete(label = options$label, fill = list(n = 0))

Full-join after goruping dataframe in R using dplyr

Answers (1)

Related Questions