Reputation: 1038
I'm trying to extract one value of a group in a long-format data structure, and spread it into a column. Best explained with an example. See sample data below. In this case, I want to extract the value for c and replicate it in a new column, based on the groupings present in the data.
I'm looking for an elegant way to achieve this, in particular a tidyverse
solution would be ideal.
year month location group value
2019 1 top a 1
2019 1 top b 2
2019 1 top c 3
2019 1 bottom a 4
2019 1 bottom b 5
2019 1 bottom c 6
2019 2 top a 7
2019 2 top b 8
2019 2 top c 9
2019 2 bottom a 10
2019 2 bottom b 11
2019 2 bottom c 12
This is the expected output:
year month location group value c_value
2019 1 top a 1 3
2019 1 top b 2 3
2019 1 top c 3 3
2019 1 bottom a 4 6
2019 1 bottom b 5 6
2019 1 bottom c 6 6
2019 2 top a 7 9
2019 2 top b 8 9
2019 2 top c 9 9
2019 2 bottom a 10 12
2019 2 bottom b 11 12
2019 2 bottom c 12 12
And the data:
structure(list(year = c(2019, 2019, 2019, 2019, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019), month = c(1, 1, 1, 1, 1, 1, 2,
2, 2, 2, 2, 2), location = c("top", "top", "top", "bottom", "bottom",
"bottom", "top", "top", "top", "bottom", "bottom", "bottom"),
group = c("a", "b", "c", "a", "b", "c", "a", "b", "c", "a",
"b", "c"), value = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
))
edit:
I did come up with a two part solution, but I still fell there is a better way.
lookup <- df %>%
group_by(year, month, location) %>%
filter(group == "c") %>%
summarize(c_value = value)
df %>%
left_join(lookup, by = c("year", "month", "location"))
Upvotes: 2
Views: 185
Reputation: 101189
Here is a base R option using merge
+ subset
merge(
subset(df, select = -value),
subset(df, group == "c", select = -group),
all = TRUE
)
which gives
year month location group value
1 2019 1 bottom a 6
2 2019 1 bottom b 6
3 2019 1 bottom c 6
4 2019 1 top a 3
5 2019 1 top b 3
6 2019 1 top c 3
7 2019 2 bottom a 12
8 2019 2 bottom b 12
9 2019 2 bottom c 12
10 2019 2 top a 9
11 2019 2 top b 9
12 2019 2 top c 9
Upvotes: 0
Reputation: 8880
df %>%
group_by(year, month, location) %>%
mutate(c_value = value[match("c", group)])
# A tibble: 12 x 6
year month location group value c_value
<dbl> <dbl> <chr> <chr> <dbl> <dbl>
1 2019 1 top a 1 3
2 2019 1 top b 2 3
3 2019 1 top c 3 3
4 2019 1 bottom a 4 6
5 2019 1 bottom b 5 6
6 2019 1 bottom c 6 6
7 2019 2 top a 7 9
8 2019 2 top b 8 9
9 2019 2 top c 9 9
10 2019 2 bottom a 10 12
11 2019 2 bottom b 11 12
12 2019 2 bottom c 12 12
use data.table
library(data.table)
dt <- as.data.table(df)
dt[, c_value := value[match("c", group)], by = c("year", "month", "location")]
Upvotes: 2
Reputation: 160417
dat %>%
group_by(year, month, location) %>%
mutate(c_value = value[group == "c"][1]) %>%
ungroup()
# # A tibble: 12 x 6
# year month location group value c_value
# <dbl> <dbl> <chr> <chr> <dbl> <dbl>
# 1 2019 1 top a 1 3
# 2 2019 1 top b 2 3
# 3 2019 1 top c 3 3
# 4 2019 1 bottom a 4 6
# 5 2019 1 bottom b 5 6
# 6 2019 1 bottom c 6 6
# 7 2019 2 top a 7 9
# 8 2019 2 top b 8 9
# 9 2019 2 top c 9 9
# 10 2019 2 bottom a 10 12
# 11 2019 2 bottom b 11 12
# 12 2019 2 bottom c 12 12
The additional [1]
guards against two error conditions:
"c"
not found:
dat %>%
group_by(year, month, location) %>%
mutate(c_value = value[group == "d"]) %>%
ungroup()
# Error: Problem with `mutate()` input `c_value`.
# x Input `c_value` can't be recycled to size 3.
# i Input `c_value` is `value[group == "d"]`.
# i Input `c_value` must be size 3 or 1, not 0.
# i The error occured in group 1: year = 2019, month = 1, location = "bottom".
Multiple "c"
found:
dat$group[2] <- "c"
dat %>%
group_by(year, month, location) %>%
mutate(c_value = value[group == "c"]) %>%
ungroup()
# Error: Problem with `mutate()` input `c_value`.
# x Input `c_value` can't be recycled to size 3.
# i Input `c_value` is `value[group == "c"]`.
# i Input `c_value` must be size 3 or 1, not 2.
Both are mitigate with the [1]
, though the second is silently truncated. With the original data:
dat %>%
group_by(year, month, location) %>%
mutate(c_value = value[group == "d"][1]) %>%
ungroup()
# # A tibble: 12 x 6
# year month location group value c_value
# <dbl> <dbl> <chr> <chr> <dbl> <dbl>
# 1 2019 1 top a 1 NA
# 2 2019 1 top b 2 NA
# 3 2019 1 top c 3 NA
# 4 2019 1 bottom a 4 NA
# 5 2019 1 bottom b 5 NA
# 6 2019 1 bottom c 6 NA
# 7 2019 2 top a 7 NA
# 8 2019 2 top b 8 NA
# 9 2019 2 top c 9 NA
# 10 2019 2 bottom a 10 NA
# 11 2019 2 bottom b 11 NA
# 12 2019 2 bottom c 12 NA
dat$group[2] <- "c"
dat %>%
group_by(year, month, location) %>%
mutate(c_value = value[group == "c"][1]) %>%
ungroup()
# # A tibble: 12 x 6
# year month location group value c_value
# <dbl> <dbl> <chr> <chr> <dbl> <dbl>
# 1 2019 1 top a 1 2
# 2 2019 1 top c 2 2
# 3 2019 1 top c 3 2
# 4 2019 1 bottom a 4 6
# 5 2019 1 bottom b 5 6
# 6 2019 1 bottom c 6 6
# 7 2019 2 top a 7 9
# 8 2019 2 top b 8 9
# 9 2019 2 top c 9 9
# 10 2019 2 bottom a 10 12
# 11 2019 2 bottom b 11 12
# 12 2019 2 bottom c 12 12
An alternate to your left_join
is a little shorter:
filter(dat, group == "c") %>%
select(-group, c_value = value) %>%
left_join(dat, ., by = c("year", "month", "location"))
Upvotes: 2
Reputation: 39595
You can also try extracting value
with a logical test around group
using brackets inside mutate()
. Here the code:
library(dplyr)
library(tidyr)
#Code
df %>% group_by(year,month,location) %>%
mutate(value_c=value[group=='c'])
Output:
# A tibble: 12 x 6
# Groups: year, month, location [4]
year month location group value value_c
<int> <int> <chr> <chr> <int> <int>
1 2019 1 top a 1 3
2 2019 1 top b 2 3
3 2019 1 top c 3 3
4 2019 1 bottom a 4 6
5 2019 1 bottom b 5 6
6 2019 1 bottom c 6 6
7 2019 2 top a 7 9
8 2019 2 top b 8 9
9 2019 2 top c 9 9
10 2019 2 bottom a 10 12
11 2019 2 bottom b 11 12
12 2019 2 bottom c 12 12
Some data used:
#Data
df <- structure(list(year = c(2019L, 2019L, 2019L, 2019L, 2019L, 2019L,
2019L, 2019L, 2019L, 2019L, 2019L, 2019L), month = c(1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), location = c("top",
"top", "top", "bottom", "bottom", "bottom", "top", "top", "top",
"bottom", "bottom", "bottom"), group = c("a", "b", "c", "a",
"b", "c", "a", "b", "c", "a", "b", "c"), value = 1:12), class = "data.frame", row.names = c(NA,
-12L))
Upvotes: 1