nimliug
nimliug

Reputation: 391

Create dummies variables according to year column (pivot)

I have this data table :

library(data.table)
dt = data.table(customer = '123',
                year = 2008:2012,
                sales_revenues_by_year = c(1966.4,180.2,903,1138.34,1593.66),
                height_sales_by_year = c(260,10,100,120,146),
                nb_orders_by_year = c(2,1,2,1,4),
                nb_item_ordered_by_year = c(4,4,2,2,26),
                customer_age = c(rep(12.85,5)),
                growth_by_year = c(NA, -0.91, 4.01, 0.26, 0.4))


   customer year sales_revenues_by_year height_sales_by_year nb_orders_by_year nb_item_ordered_by_year customer_age growth_by_year
1:      123 2008                1966.40                  260                 2                       4        12.85             NA
2:      123 2009                 180.20                   10                 1                       4        12.85          -0.91
3:      123 2010                 903.00                  100                 2                       2        12.85           4.01
4:      123 2011                1138.34                  120                 1                       2        12.85           0.26
5:      123 2012                1593.66                  146                 4                      26        12.85           0.40

I want to create an output data where each columns from sales_revenues_by_year to growth_by_year will become sales_revenues_by_year_2008, sales_revenues_by_year_2009, ..., ..., growth_by_year_2008, ..., growth_by_year_2012.

The year column is my pivot table. At the end, for this customer (123) I only have 1 row with a lot of columns (5 columns(2008,2009,2010,2011,2012) by new variables)

Thanks a lot for any answer !!

Upvotes: 0

Views: 25

Answers (1)

Duck
Duck

Reputation: 39595

One option with dplyr and tidyr:

library(dplyr)
library(tidyr)
#Code
newdt <- dt %>% pivot_longer(-c(customer,year)) %>%
  mutate(var=paste0(name,'_',year)) %>%
  select(-c(year,name)) %>%
  pivot_wider(names_from = var,values_from=value,values_fill = 0)

Output:

# A tibble: 1 x 31
  customer sales_revenues_~ height_sales_by~ nb_orders_by_ye~ nb_item_ordered~ customer_age_20~
  <chr>               <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
1 123                 1966.              260                2                4             12.8
# ... with 25 more variables: growth_by_year_2008 <dbl>, sales_revenues_by_year_2009 <dbl>,
#   height_sales_by_year_2009 <dbl>, nb_orders_by_year_2009 <dbl>,
#   nb_item_ordered_by_year_2009 <dbl>, customer_age_2009 <dbl>, growth_by_year_2009 <dbl>,
#   sales_revenues_by_year_2010 <dbl>, height_sales_by_year_2010 <dbl>,
#   nb_orders_by_year_2010 <dbl>, nb_item_ordered_by_year_2010 <dbl>, customer_age_2010 <dbl>,
#   growth_by_year_2010 <dbl>, sales_revenues_by_year_2011 <dbl>, height_sales_by_year_2011 <dbl>,
#   nb_orders_by_year_2011 <dbl>, nb_item_ordered_by_year_2011 <dbl>, customer_age_2011 <dbl>,
#   growth_by_year_2011 <dbl>, sales_revenues_by_year_2012 <dbl>, height_sales_by_year_2012 <dbl>,
#   nb_orders_by_year_2012 <dbl>, nb_item_ordered_by_year_2012 <dbl>, customer_age_2012 <dbl>,
#   growth_by_year_2012 <dbl>

Or more directly:

#Code 2
newdt <- dt %>% pivot_wider(names_from=year,
                            values_from=c(sales_revenues_by_year:growth_by_year),
                            values_fill = 0)

Output:

# A tibble: 1 x 31
  customer sales_revenues_~ sales_revenues_~ sales_revenues_~ sales_revenues_~ sales_revenues_~
  <chr>               <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
1 123                 1966.             180.              903            1138.            1594.
# ... with 25 more variables: height_sales_by_year_2008 <dbl>, height_sales_by_year_2009 <dbl>,
#   height_sales_by_year_2010 <dbl>, height_sales_by_year_2011 <dbl>,
#   height_sales_by_year_2012 <dbl>, nb_orders_by_year_2008 <dbl>, nb_orders_by_year_2009 <dbl>,
#   nb_orders_by_year_2010 <dbl>, nb_orders_by_year_2011 <dbl>, nb_orders_by_year_2012 <dbl>,
#   nb_item_ordered_by_year_2008 <dbl>, nb_item_ordered_by_year_2009 <dbl>,
#   nb_item_ordered_by_year_2010 <dbl>, nb_item_ordered_by_year_2011 <dbl>,
#   nb_item_ordered_by_year_2012 <dbl>, customer_age_2008 <dbl>, customer_age_2009 <dbl>,
#   customer_age_2010 <dbl>, customer_age_2011 <dbl>, customer_age_2012 <dbl>,
#   growth_by_year_2008 <dbl>, growth_by_year_2009 <dbl>, growth_by_year_2010 <dbl>,
#   growth_by_year_2011 <dbl>, growth_by_year_2012 <dbl>

Upvotes: 1

Related Questions