Reputation: 2417
I have a dataframe that looks like this:
+--------+---------------+----+
| period | label | n |
+--------+---------------+----+
| 4 | Engaged | 2 |
| 4 | Remarkable | 1 |
| 5 | Engaged | 1 |
| 5 | Inconsistent | 2 |
| 5 | Remarkable | 5 |
| 6 | Engaged | 1 |
| 6 | Inconsistent | 1 |
| 6 | Remarkable | 5 |
| 7 | Engaged | 2 |
| 7 | Remarkable | 3 |
| 7 | Transactional | 2 |
+--------+---------------+----+
And I need to make it such that every option for label (Inconsistent, Transactional, Engaged, Remarkable
) is present for every period. And if every label is not used in each period, then it should be inserted for that period with the value of n
equal to 0.
I thought about pivoting the dataframe from long-to-wide, and then filling in missing values with 0, but sometimes every value might not be see in any of the periods. I also thought about grouping the dataframe by period and then doing a full join to all the labels, but it seems that groups are ignored when joining a dataframe.
I need to end up with a dataframe that looks like this:
+--------+---------------+----+
| period | label | n |
+--------+---------------+----+
| 4 | Inconsistent | 0 |
| 4 | Transactional | 0 |
| 4 | Engaged | 2 |
| 4 | Remarkable | 1 |
| 5 | Inconsistent | 2 |
| 5 | Transactional | 0 |
| 5 | Engaged | 1 |
| 5 | Remarkable | 5 |
| 6 | Inconsistent | 1 |
| 6 | Transactional | 0 |
| 6 | Engaged | 1 |
| 6 | Remarkable | 5 |
| 7 | Inconsistent | 0 |
| 7 | Transactional | 2 |
| 7 | Engaged | 2 |
| 7 | Remarkable | 3 |
+--------+---------------+----+
Here is the example data that I am working with:
df <- as.data.frame(
list(
period = c(4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L,
7L, 7L, 7L),
label = c(
"Engaged",
"Remarkable",
"Engaged",
"Inconsistent",
"Remarkable",
"Engaged",
"Inconsistent",
"Remarkable",
"Engaged",
"Remarkable",
"Transactional"
),
n = c(2L, 1L, 1L,
2L, 5L, 1L, 1L, 5L, 2L, 3L, 1L)
)
)
options <- as.data.frame(
list(
label = c(
"Inconsistent",
"Transactional",
"Engaged",
"Remarkable"
),
n = c(0L, 0L, 0L, 0L)
)
)
Upvotes: 1
Views: 116
Reputation: 887128
We can group by 'period', and then complete
the 'label' based on the values of 'label' from 'options' dataset
library(dplyr)
library(tidyr)
df %>%
group_by(period) %>%
complete(label = options$label, fill = list(n = 0))
Upvotes: 4