Jazzmatazz
Jazzmatazz

Reputation: 645

Unnest and Pivot Longer with Duplicate Columns

I have a nested df that I am trying to clean up.

Sample Data:
df <- 
  tibble::tribble(
  ~idTeam, ~ptsTotalBehindFirst, ~ptsOverall, ~ptsDiffLastPeriod, ~rankOverall, ~ptsBattingBehindFirst, ~ptsBatting, ~ptsDiffBattingLastPeriod,                                                                                                                                                                            ~dataBatting, ~rankBatting, ~ptsPitchingBehindFirst, ~ptsPitching, ~ptsDiffPitchingLastPeriod,                                                                                                                                                                                                   ~dataPitching, ~rankPitching,
      "2",                  "0",       "111",               "-4",           1L,                    "0",        "65",                       "0", list(abbr = c("OBP", "HR", "RBI", "R", "SB"), roto_points = c(13, 13, 13, 13, 13), value = c(0.3663, 384, 1012, 1102, 164), diff = c(0, 0, 0, 0, 0), rank = c(1, 1, 1, 1, 1)),           1L,                     "5",         "46",                       "-4",                            list(abbr = c("S", "W", "K", "ERA", "WHIP"), roto_points = c(12, 6, 11, 8, 9), value = c(94, 89, 1576, 3.946, 1.2179), diff = c(0, -2, -2, 0, 0), rank = c(2, 8, 3, 6, 5)),            3L,
      "8",               "13.5",      "97.5",                "2",           2L,                   "13",        "52",                       "0",    list(abbr = c("OBP", "HR", "RBI", "R", "SB"), roto_points = c(12, 11, 11, 12, 6), value = c(0.3576, 323, 954, 1011, 89), diff = c(0, 0, 0, 0, 0), rank = c(2, 3, 3, 2, 8)),           3L,                   "5.5",       "45.5",                        "2", list(abbr = c("S", "W", "K", "ERA", "WHIP"), roto_points = c(2, 7.5, 10, "13", 13), value = c(56, 91, 1508, 3.688, 1.1474), diff = c(-1, 1.5, 0.5, 1, 0), rank = c(12, 6, 4, 1, 1)),            4L
  )

The data I am trying to unnest is stored in the dataBatting and dataPitching columns. I am trying to unnest all of the columns in both columns and bind the results as rows. Something akin to pivot_longer but I wasn't sure of the right way to do this with 4 duplicate columns nested within 2 separate columns.

My attempt to do this was:

  df %>% 
  unnest_wider(dataBatting) %>% 
  unnest(c(abbr, roto_points, value, diff, rank)) %>% 
  unnest_wider(dataPitching) %>% 
  unnest(c(abbr, roto_points, value, diff, rank))


Error is:
Error: Column names `abbr`, `roto_points`, `value`, `diff`, `rank` must not be duplicated.
Use .name_repair to specify repair.
Call `rlang::last_error()` to see a backtrace

My issue is that I want to bind the same columns from dataPitching that have the same column names as dataBatting (abbr, roto_points, value, diff, rank).

I also want to change the name of the columns that are duplicates. Is tidyr::hoist a better way to do this?

Desired df:

tibble::tribble(
                                              ~idTeam, ~ptsTotalBehindFirst, ~ptsOverall, ~ptsDiffLastPeriod, ~rankOverall, ~ptsBattingBehindFirst, ~ptsBatting, ~ptsDiffBattingLastPeriod,  ~abbr, ~roto_points5, ~value, ~diff, ~rank, ~rankPitching, ~ptsPitchingBehindFirst, ~ptsPitching, ~ptsDiffPitchingLastPeriod,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,  "OBP",            13, 0.3663,     0,     1,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,   "HR",            13,    384,     0,     1,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,  "RBI",            13,   1012,     0,     1,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,    "R",            13,   1102,     0,     1,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,   "SB",            13,    164,     0,     1,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,    "S",            12,     94,     0,     2,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,    "W",             6,     89,    -2,     8,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,    "K",            11,   1576,    -2,     3,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0,  "ERA",             8,  3.946,     0,     6,             3,                       5,           46,                         -4,
                                                    2,                    0,         111,                 -4,            1,                      0,          65,                         0, "WHIP",             9, 1.2179,     0,     5,             3,                       5,           46,                         -4,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,  "OBP",            12, 0.3576,     0,     2,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,   "HR",            11,    323,     0,     3,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,  "RBI",            11,    954,     0,     3,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,    "R",            12,   1011,     0,     2,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,   "SB",             6,     89,     0,     8,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,    "S",             2,     56,    -1,    12,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,    "W",           7.5,     91,   1.5,     6,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,    "K",            10,   1508,   0.5,     4,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0,  "ERA",            13,  3.688,     1,     1,             4,                     5.5,         45.5,                          2,
                                                    8,                 13.5,        97.5,                  2,            2,                     13,          52,                         0, "WHIP",            13, 1.1474,     0,     1,             4,                     5.5,         45.5,                          2
                                              )

Upvotes: 1

Views: 1252

Answers (1)

akrun
akrun

Reputation: 887223

An option is to loop through the 'dataBatting', 'dataPitching' column names, do the unnest_wider separately, unnest the columns of interest, and bind the rows together (map_dfr - suffix 'dfr' returns dataframe with rows binded together from a list of data.frames of tibbles). One thing that should be noted is that many of the tidyverse functions are type sensitive. Here, we find some list elements to have different types and this would have an issue in unnest unless the 'ptype' is mentioned. To avoid that, we can use type.convert to change the type automatically based on the values and then do the unnesting

library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
map_dfr(c('dataBatting', 'dataPitching'), ~ 
         df %>% 
           unnest_wider(.x) %>%
           mutate_at(vars(c(abbr, roto_points, value, diff, rank)), 
                    type.convert) %>% 
           unnest(c(abbr, roto_points, value, diff, rank)) %>% 
           mutate_if(is.factor, as.character) %>%
           select(-one_of(c("dataBatting", "dataPitching")))) 

Upvotes: 3

Related Questions