Johannes
Johannes

Reputation: 1064

Replicate a row (specific columns) based on a vector

Is there a straightforward way in dplyr to expand a dataframe by replicating a row based on a specific vector?

For example, I have following dataframe:

df <- tibble(Year=c(2019),
       cat1=c("A","B"),
       cat2=c("X","Y"),
       value1=c(1,2),
       value2=c(10,20))

selected_years <- c(2019:2021)

where I would like to replicate the row where cat1=="A" for the years 2019-2021. The values of some columns (value1, cat) should be taken from the original year 2019, some other columns (value2) filled with NAs.

The final output should look like:

Year cat   value1 value2
2019 A          1     10
2020 A          1     NA
2021 A          1     NA
2019 B          2     20

I tried with bind_rows()...however, the result is not fully what I wanted (I only get the "A"-Part not the "B"-Part), and I am not sure if this is really the most intuitive/dplyr way to go, or if another approach (or even specific function) would be more reasonable:

df%>%
  filter(cat1=="A",Year==2019)%>%
  bind_rows(
    data.frame(
    Year=setdiff(selected_years,.$Year),
    cat1=.$cat1,
    value1=.$value1
    )
  )
)

Edit: I also tried using expand and right_join, but I then my desired column values are not repeated:

df %>% 
  dplyr::right_join(df %>%
                      filter(cat1=="A",Year==2019)%>%
                      expand(Year=c(2019:2021)))

Maybe an approach involving case_when?

Upvotes: 4

Views: 269

Answers (3)

PaulS
PaulS

Reputation: 25528

A solution based on dplyr::bind_rows:

library(tidyverse)

df <- tibble(Year=c(2019),
             cat1=c("A","B"),
             cat2=c("X","Y"),
             value1=c(1,2),
             value2=c(10,20))

selected_years <- c(2020:2021)

df %>% 
  bind_rows(data.frame(
    Year=selected_years, filter(., cat1 == "A") %>% select(-Year, -value2))) %>% 
    arrange(cat1)

#> # A tibble: 4 × 5
#>    Year cat1  cat2  value1 value2
#>   <dbl> <chr> <chr>  <dbl>  <dbl>
#> 1  2019 A     X          1     10
#> 2  2020 A     X          1     NA
#> 3  2021 A     X          1     NA
#> 4  2019 B     Y          2     20

Upvotes: 0

TarJae
TarJae

Reputation: 79311

library(tidyverse)
tibble(selected_years) %>% 
  mutate(cat1 = "A") %>% 
  full_join(df, by = "cat1") %>% 
  mutate(selected_years = ifelse(is.na(selected_years), Year, selected_years)) %>% 
  group_by(cat1) %>% 
  mutate(value2 = ifelse(row_number() != 1, NA, value2)) %>%
  ungroup() %>% 
  select(Year = selected_years, cat = cat1, value1, value2)
   Year cat   value1 value2
  <dbl> <chr>  <dbl>  <dbl>
1  2019 A          1     10
2  2020 A          1     NA
3  2021 A          1     NA
4  2019 B          2     20

Upvotes: 0

TimTeaFan
TimTeaFan

Reputation: 18581

The part where you only want to keep specfic values and others not, makes this tricky. It is easy to expand the vector on all values using rowwise and unnest together with the condition in if_else. In the last step we just reset the values to NA which we don't want to replicate. If you have more than one value that you want to set NA, we can use across.

library(tidyverse)

df <- tibble(Year=c(2019),
             cat1=c("A","B"),
             cat2=c("X","Y"),
             value1=c(1,2),
             value2=c(10,20))

selected_years <- c(2019:2021)


df %>% 
  rowwise %>% 
  mutate(Year = if_else(cat1 == "A", list(selected_years), list(Year))) %>% 
  unnest(Year) %>% 
  mutate(value2 = if_else(Year != 2019, NA_real_, value2))

#> # A tibble: 4 x 5
#>    Year cat1  cat2  value1 value2
#>   <dbl> <chr> <chr>  <dbl>  <dbl>
#> 1  2019 A     X          1     10
#> 2  2020 A     X          1     NA
#> 3  2021 A     X          1     NA
#> 4  2019 B     Y          2     20

Created on 2021-12-08 by the reprex package (v2.0.1)


Or we could create a df2 and full_join it with df:

library(dplyr)

df2 <- tibble(Year = selected_years,
             cat1 = "A",
             cat2 = "X",
             value1 = 1)

df %>% 
  full_join(df2, by = c("Year", "cat1", "cat2", "value1"))

#> # A tibble: 4 x 5
#>    Year cat1  cat2  value1 value2
#>   <dbl> <chr> <chr>  <dbl>  <dbl>
#> 1  2019 A     X          1     10
#> 2  2019 B     Y          2     20
#> 3  2020 A     X          1     NA
#> 4  2021 A     X          1     NA

Created on 2021-12-08 by the reprex package (v2.0.1)

Upvotes: 1

Related Questions