Lloyd Christmas
Lloyd Christmas

Reputation: 1038

Extracting one value of a group to new column

I'm trying to extract one value of a group in a long-format data structure, and spread it into a column. Best explained with an example. See sample data below. In this case, I want to extract the value for c and replicate it in a new column, based on the groupings present in the data.

I'm looking for an elegant way to achieve this, in particular a tidyverse solution would be ideal.

  year month location group value
  2019     1 top      a         1
  2019     1 top      b         2
  2019     1 top      c         3
  2019     1 bottom   a         4
  2019     1 bottom   b         5
  2019     1 bottom   c         6
  2019     2 top      a         7
  2019     2 top      b         8
  2019     2 top      c         9
  2019     2 bottom   a        10
  2019     2 bottom   b        11
  2019     2 bottom   c        12

This is the expected output:

  year month location group value c_value
  2019     1 top      a         1       3
  2019     1 top      b         2       3
  2019     1 top      c         3       3
  2019     1 bottom   a         4       6
  2019     1 bottom   b         5       6
  2019     1 bottom   c         6       6
  2019     2 top      a         7       9
  2019     2 top      b         8       9
  2019     2 top      c         9       9
  2019     2 bottom   a        10      12
  2019     2 bottom   b        11      12
  2019     2 bottom   c        12      12

And the data:

structure(list(year = c(2019, 2019, 2019, 2019, 2019, 2019, 2019, 
2019, 2019, 2019, 2019, 2019), month = c(1, 1, 1, 1, 1, 1, 2, 
2, 2, 2, 2, 2), location = c("top", "top", "top", "bottom", "bottom", 
"bottom", "top", "top", "top", "bottom", "bottom", "bottom"), 
    group = c("a", "b", "c", "a", "b", "c", "a", "b", "c", "a", 
    "b", "c"), value = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
    )), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
))

edit:

I did come up with a two part solution, but I still fell there is a better way.

lookup <- df %>%
  group_by(year, month, location) %>%
  filter(group == "c") %>%
  summarize(c_value = value)


df %>%
  left_join(lookup, by = c("year", "month", "location"))

Upvotes: 2

Views: 185

Answers (4)

ThomasIsCoding
ThomasIsCoding

Reputation: 101189

Here is a base R option using merge + subset

merge(
  subset(df, select = -value),
  subset(df, group == "c", select = -group),
  all = TRUE
)

which gives

   year month location group value
1  2019     1   bottom     a     6
2  2019     1   bottom     b     6
3  2019     1   bottom     c     6
4  2019     1      top     a     3
5  2019     1      top     b     3
6  2019     1      top     c     3
7  2019     2   bottom     a    12
8  2019     2   bottom     b    12
9  2019     2   bottom     c    12
10 2019     2      top     a     9
11 2019     2      top     b     9
12 2019     2      top     c     9

Upvotes: 0

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

df %>% 
  group_by(year, month, location) %>% 
  mutate(c_value = value[match("c", group)])

# A tibble: 12 x 6
    year month location group value c_value
   <dbl> <dbl> <chr>    <chr> <dbl>   <dbl>
 1  2019     1 top      a         1       3
 2  2019     1 top      b         2       3
 3  2019     1 top      c         3       3
 4  2019     1 bottom   a         4       6
 5  2019     1 bottom   b         5       6
 6  2019     1 bottom   c         6       6
 7  2019     2 top      a         7       9
 8  2019     2 top      b         8       9
 9  2019     2 top      c         9       9
10  2019     2 bottom   a        10      12
11  2019     2 bottom   b        11      12
12  2019     2 bottom   c        12      12

use data.table

library(data.table)
dt <- as.data.table(df)
dt[, c_value := value[match("c", group)], by = c("year", "month", "location")]

Upvotes: 2

r2evans
r2evans

Reputation: 160417

dat %>%
  group_by(year, month, location) %>%
  mutate(c_value = value[group == "c"][1]) %>%
  ungroup()
# # A tibble: 12 x 6
#     year month location group value c_value
#    <dbl> <dbl> <chr>    <chr> <dbl>   <dbl>
#  1  2019     1 top      a         1       3
#  2  2019     1 top      b         2       3
#  3  2019     1 top      c         3       3
#  4  2019     1 bottom   a         4       6
#  5  2019     1 bottom   b         5       6
#  6  2019     1 bottom   c         6       6
#  7  2019     2 top      a         7       9
#  8  2019     2 top      b         8       9
#  9  2019     2 top      c         9       9
# 10  2019     2 bottom   a        10      12
# 11  2019     2 bottom   b        11      12
# 12  2019     2 bottom   c        12      12

The additional [1] guards against two error conditions:

  1. "c" not found:

    dat %>%
      group_by(year, month, location) %>%
      mutate(c_value = value[group == "d"]) %>%
      ungroup()
    # Error: Problem with `mutate()` input `c_value`.
    # x Input `c_value` can't be recycled to size 3.
    # i Input `c_value` is `value[group == "d"]`.
    # i Input `c_value` must be size 3 or 1, not 0.
    # i The error occured in group 1: year = 2019, month = 1, location = "bottom".
    
  2. Multiple "c" found:

    dat$group[2] <- "c"
    dat %>%
      group_by(year, month, location) %>%
      mutate(c_value = value[group == "c"]) %>%
      ungroup()
    # Error: Problem with `mutate()` input `c_value`.
    # x Input `c_value` can't be recycled to size 3.
    # i Input `c_value` is `value[group == "c"]`.
    # i Input `c_value` must be size 3 or 1, not 2.
    

Both are mitigate with the [1], though the second is silently truncated. With the original data:

dat %>%
  group_by(year, month, location) %>%
  mutate(c_value = value[group == "d"][1]) %>%
  ungroup()
# # A tibble: 12 x 6
#     year month location group value c_value
#    <dbl> <dbl> <chr>    <chr> <dbl>   <dbl>
#  1  2019     1 top      a         1      NA
#  2  2019     1 top      b         2      NA
#  3  2019     1 top      c         3      NA
#  4  2019     1 bottom   a         4      NA
#  5  2019     1 bottom   b         5      NA
#  6  2019     1 bottom   c         6      NA
#  7  2019     2 top      a         7      NA
#  8  2019     2 top      b         8      NA
#  9  2019     2 top      c         9      NA
# 10  2019     2 bottom   a        10      NA
# 11  2019     2 bottom   b        11      NA
# 12  2019     2 bottom   c        12      NA

dat$group[2] <- "c"
dat %>%
  group_by(year, month, location) %>%
  mutate(c_value = value[group == "c"][1]) %>%
  ungroup()
# # A tibble: 12 x 6
#     year month location group value c_value
#    <dbl> <dbl> <chr>    <chr> <dbl>   <dbl>
#  1  2019     1 top      a         1       2
#  2  2019     1 top      c         2       2
#  3  2019     1 top      c         3       2
#  4  2019     1 bottom   a         4       6
#  5  2019     1 bottom   b         5       6
#  6  2019     1 bottom   c         6       6
#  7  2019     2 top      a         7       9
#  8  2019     2 top      b         8       9
#  9  2019     2 top      c         9       9
# 10  2019     2 bottom   a        10      12
# 11  2019     2 bottom   b        11      12
# 12  2019     2 bottom   c        12      12

An alternate to your left_join is a little shorter:

filter(dat, group == "c") %>%
  select(-group, c_value = value) %>%
  left_join(dat, ., by = c("year", "month", "location"))

Upvotes: 2

Duck
Duck

Reputation: 39595

You can also try extracting value with a logical test around group using brackets inside mutate(). Here the code:

library(dplyr)
library(tidyr)
#Code
df %>% group_by(year,month,location) %>%
  mutate(value_c=value[group=='c'])

Output:

# A tibble: 12 x 6
# Groups:   year, month, location [4]
    year month location group value value_c
   <int> <int> <chr>    <chr> <int>   <int>
 1  2019     1 top      a         1       3
 2  2019     1 top      b         2       3
 3  2019     1 top      c         3       3
 4  2019     1 bottom   a         4       6
 5  2019     1 bottom   b         5       6
 6  2019     1 bottom   c         6       6
 7  2019     2 top      a         7       9
 8  2019     2 top      b         8       9
 9  2019     2 top      c         9       9
10  2019     2 bottom   a        10      12
11  2019     2 bottom   b        11      12
12  2019     2 bottom   c        12      12

Some data used:

#Data
df <- structure(list(year = c(2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 
2019L, 2019L, 2019L, 2019L, 2019L, 2019L), month = c(1L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), location = c("top", 
"top", "top", "bottom", "bottom", "bottom", "top", "top", "top", 
"bottom", "bottom", "bottom"), group = c("a", "b", "c", "a", 
"b", "c", "a", "b", "c", "a", "b", "c"), value = 1:12), class = "data.frame", row.names = c(NA, 
-12L))

Upvotes: 1

Related Questions