CurtLH
CurtLH

Reputation: 2417

Full-join after goruping dataframe in R using dplyr

I have a dataframe that looks like this:

+--------+---------------+----+
| period |     label     | n  |
+--------+---------------+----+
|      4 | Engaged       |  2 |
|      4 | Remarkable    |  1 |
|      5 | Engaged       |  1 |
|      5 | Inconsistent  |  2 |
|      5 | Remarkable    |  5 |
|      6 | Engaged       |  1 |
|      6 | Inconsistent  |  1 |
|      6 | Remarkable    |  5 |
|      7 | Engaged       |  2 |
|      7 | Remarkable    |  3 |
|      7 | Transactional |  2 |
+--------+---------------+----+

And I need to make it such that every option for label (Inconsistent, Transactional, Engaged, Remarkable) is present for every period. And if every label is not used in each period, then it should be inserted for that period with the value of n equal to 0.

I thought about pivoting the dataframe from long-to-wide, and then filling in missing values with 0, but sometimes every value might not be see in any of the periods. I also thought about grouping the dataframe by period and then doing a full join to all the labels, but it seems that groups are ignored when joining a dataframe.

I need to end up with a dataframe that looks like this:

+--------+---------------+----+
| period |     label     | n  |
+--------+---------------+----+
|      4 | Inconsistent  |  0 |
|      4 | Transactional |  0 |
|      4 | Engaged       |  2 |
|      4 | Remarkable    |  1 |
|      5 | Inconsistent  |  2 |
|      5 | Transactional |  0 |
|      5 | Engaged       |  1 |
|      5 | Remarkable    |  5 |
|      6 | Inconsistent  |  1 |
|      6 | Transactional |  0 |
|      6 | Engaged       |  1 |
|      6 | Remarkable    |  5 |
|      7 | Inconsistent  |  0 |
|      7 | Transactional |  2 |
|      7 | Engaged       |  2 |
|      7 | Remarkable    |  3 |
+--------+---------------+----+

Here is the example data that I am working with:

df <- as.data.frame(
  list(
    period = c(4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L,
                  7L, 7L, 7L),
    label = c(
      "Engaged",
      "Remarkable",
      "Engaged",
      "Inconsistent",
      "Remarkable",
      "Engaged",
      "Inconsistent",
      "Remarkable",
      "Engaged",
      "Remarkable",
      "Transactional"
    ),
    n = c(2L, 1L, 1L,
          2L, 5L, 1L, 1L, 5L, 2L, 3L, 1L)
  )
)

options <- as.data.frame(
  list(
    label = c(
      "Inconsistent",
      "Transactional",
      "Engaged",
      "Remarkable"
    ),
    n = c(0L, 0L, 0L, 0L)
  )
)

Upvotes: 1

Views: 116

Answers (1)

akrun
akrun

Reputation: 887128

We can group by 'period', and then complete the 'label' based on the values of 'label' from 'options' dataset

library(dplyr)
library(tidyr)
df %>% 
     group_by(period) %>%
     complete(label = options$label, fill = list(n = 0))

Upvotes: 4

Related Questions