Reputation: 843

Adding a new year variable to a data frame (with all other variables being duplicated)

I have a data frame containing a shape-file that I want to merge with another data-set that contains years. I'm interested in adding a variable with years to the former while all other variables remain the same for each year. I'm not sure how to do this.

As an example, say I have the following data-set:

a <- data.frame(code = c("aaa" , "bbb", "ccc") ,
            item = c("apples" , "bananas" , "carrots") ,
            id = c(1,2,3))

giving the following:

  code    item id
1  aaa  apples  1
2  bbb bananas  2
3  ccc carrots  3

I would like to add a new variable called year of length n that repeats all the same elements of the other variables for each year. For example, say I'd like to add the years 1990 to 1992 to an existing object like this:

  code    item id year
1  aaa  apples  1 1990
2  aaa  apples  1 1991
3  aaa  apples  1 1992
4  bbb bananas  2 1990
5  bbb bananas  2 1991
6  bbb bananas  2 1992
7  ccc carrots  3 1990
8  ccc carrots  3 1991
9  ccc carrots  3 1992

Is there a way of doing this (for existing data frames)? For this example I used this code;

b <- data.frame(code = rep(c("aaa" , "bbb", "ccc") , each = 3) ,
                item = rep(c("apples" , "bananas" , "carrots") , each = 3) ,
                id = rep(c(1,2,3) , each = 3) ,
                year = rep(c(1990:1992) , times = 3))

but this would not work (or is extremely inefficient) when the data-set is already there or extremely large. Is there a better way of doing this?

Upvotes: 4

Answers (5)

AndrewGB

Reputation: 16836

Using sqldf:

library(sqldf)

a <- data.frame(
  code = c("aaa" , "bbb", "ccc") ,
  item = c("apples" , "bananas" , "carrots") ,
  id = c(1, 2, 3)
)
y <- data.frame(year = 1990:1992)

sqldf("SELECT * FROM a
      CROSS JOIN y")

Output

  code    item id year
1  aaa  apples  1 1990
2  aaa  apples  1 1991
3  aaa  apples  1 1992
4  bbb bananas  2 1990
5  bbb bananas  2 1991
6  bbb bananas  2 1992
7  ccc carrots  3 1990
8  ccc carrots  3 1991
9  ccc carrots  3 1992

Upvotes: 1

PaulS

Reputation: 25313

Another possible solution, based on tidyr::separate_rows:

library(tidyverse)

a <- data.frame(code = c("aaa" , "bbb", "ccc") ,
                item = c("apples" , "bananas" , "carrots") ,
                id = c(1,2,3))

a %>% 
  mutate(year = list(1990:1992)) %>% 
  separate_rows(year, sep=",", convert = T)

#> # A tibble: 9 × 4
#>   code  item       id  year
#>   <chr> <chr>   <dbl> <int>
#> 1 aaa   apples      1  1990
#> 2 aaa   apples      1  1991
#> 3 aaa   apples      1  1992
#> 4 bbb   bananas     2  1990
#> 5 bbb   bananas     2  1991
#> 6 bbb   bananas     2  1992
#> 7 ccc   carrots     3  1990
#> 8 ccc   carrots     3  1991
#> 9 ccc   carrots     3  1992

Upvotes: 2

TarJae

Reputation: 78917

We could replicate the rows N times and then mutate the year column:

library(dplyr)

N <- 3
a %>% 
  group_by(code) %>% 
  slice(rep(1:n(), each = N)) %>% 
  mutate(year= 1990:1992)

  code  item       id  year
  <chr> <chr>   <dbl> <int>
1 aaa   apples      1  1990
2 aaa   apples      1  1991
3 aaa   apples      1  1992
4 bbb   bananas     2  1990
5 bbb   bananas     2  1991
6 bbb   bananas     2  1992
7 ccc   carrots     3  1990
8 ccc   carrots     3  1991
9 ccc   carrots     3  1992

Upvotes: 2

r2evans

Reputation: 160407

Base R:

b <- data.frame(year = 1990:1992)
merge(a, b, by = NULL)
#   code    item id year
# 1  aaa  apples  1 1990
# 2  bbb bananas  2 1990
# 3  ccc carrots  3 1990
# 4  aaa  apples  1 1991
# 5  bbb bananas  2 1991
# 6  ccc carrots  3 1991
# 7  aaa  apples  1 1992
# 8  bbb bananas  2 1992
# 9  ccc carrots  3 1992

Data

a <- structure(list(code = c("aaa", "bbb", "ccc"), item = c("apples", "bananas", "carrots"), id = c(1, 2, 3)), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 4

akrun

Reputation: 886938

We can create a list column grouped by 'code' or 'id' and then unnest the list

library(dplyr)
library(tidyr)
a %>% 
  group_by(id) %>% 
  mutate(year = list(1990:1992)) %>% 
  ungroup %>%
  unnest(year)

-output

# A tibble: 9 × 4
  code  item       id  year
  <chr> <chr>   <dbl> <int>
1 aaa   apples      1  1990
2 aaa   apples      1  1991
3 aaa   apples      1  1992
4 bbb   bananas     2  1990
5 bbb   bananas     2  1991
6 bbb   bananas     2  1992
7 ccc   carrots     3  1990
8 ccc   carrots     3  1991
9 ccc   carrots     3  1992

Or use crossing

 crossing(a, year = 1990:1992)
# A tibble: 9 × 4
  code  item       id  year
  <chr> <chr>   <dbl> <int>
1 aaa   apples      1  1990
2 aaa   apples      1  1991
3 aaa   apples      1  1992
4 bbb   bananas     2  1990
5 bbb   bananas     2  1991
6 bbb   bananas     2  1992
7 ccc   carrots     3  1990
8 ccc   carrots     3  1991
9 ccc   carrots     3  1992

Upvotes: 3

Adding a new year variable to a data frame (with all other variables being duplicated)

Answers (5)

Related Questions