Reputation: 645
I have a nested df
that I am trying to clean up.
Sample Data:
df <-
tibble::tribble(
~idTeam, ~ptsTotalBehindFirst, ~ptsOverall, ~ptsDiffLastPeriod, ~rankOverall, ~ptsBattingBehindFirst, ~ptsBatting, ~ptsDiffBattingLastPeriod, ~dataBatting, ~rankBatting, ~ptsPitchingBehindFirst, ~ptsPitching, ~ptsDiffPitchingLastPeriod, ~dataPitching, ~rankPitching,
"2", "0", "111", "-4", 1L, "0", "65", "0", list(abbr = c("OBP", "HR", "RBI", "R", "SB"), roto_points = c(13, 13, 13, 13, 13), value = c(0.3663, 384, 1012, 1102, 164), diff = c(0, 0, 0, 0, 0), rank = c(1, 1, 1, 1, 1)), 1L, "5", "46", "-4", list(abbr = c("S", "W", "K", "ERA", "WHIP"), roto_points = c(12, 6, 11, 8, 9), value = c(94, 89, 1576, 3.946, 1.2179), diff = c(0, -2, -2, 0, 0), rank = c(2, 8, 3, 6, 5)), 3L,
"8", "13.5", "97.5", "2", 2L, "13", "52", "0", list(abbr = c("OBP", "HR", "RBI", "R", "SB"), roto_points = c(12, 11, 11, 12, 6), value = c(0.3576, 323, 954, 1011, 89), diff = c(0, 0, 0, 0, 0), rank = c(2, 3, 3, 2, 8)), 3L, "5.5", "45.5", "2", list(abbr = c("S", "W", "K", "ERA", "WHIP"), roto_points = c(2, 7.5, 10, "13", 13), value = c(56, 91, 1508, 3.688, 1.1474), diff = c(-1, 1.5, 0.5, 1, 0), rank = c(12, 6, 4, 1, 1)), 4L
)
The data I am trying to unnest
is stored in the dataBatting
and dataPitching
columns. I am trying to unnest
all of the columns in both columns and bind the results as rows. Something akin to pivot_longer
but I wasn't sure of the right way to do this with 4 duplicate columns nested within 2 separate columns.
My attempt to do this was:
df %>%
unnest_wider(dataBatting) %>%
unnest(c(abbr, roto_points, value, diff, rank)) %>%
unnest_wider(dataPitching) %>%
unnest(c(abbr, roto_points, value, diff, rank))
Error is:
Error: Column names `abbr`, `roto_points`, `value`, `diff`, `rank` must not be duplicated.
Use .name_repair to specify repair.
Call `rlang::last_error()` to see a backtrace
My issue is that I want to bind the same columns from dataPitching that have the same column names as dataBatting (abbr, roto_points, value, diff, rank).
I also want to change the name of the columns that are duplicates. Is tidyr::hoist
a better way to do this?
Desired df
:
tibble::tribble(
~idTeam, ~ptsTotalBehindFirst, ~ptsOverall, ~ptsDiffLastPeriod, ~rankOverall, ~ptsBattingBehindFirst, ~ptsBatting, ~ptsDiffBattingLastPeriod, ~abbr, ~roto_points5, ~value, ~diff, ~rank, ~rankPitching, ~ptsPitchingBehindFirst, ~ptsPitching, ~ptsDiffPitchingLastPeriod,
2, 0, 111, -4, 1, 0, 65, 0, "OBP", 13, 0.3663, 0, 1, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "HR", 13, 384, 0, 1, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "RBI", 13, 1012, 0, 1, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "R", 13, 1102, 0, 1, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "SB", 13, 164, 0, 1, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "S", 12, 94, 0, 2, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "W", 6, 89, -2, 8, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "K", 11, 1576, -2, 3, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "ERA", 8, 3.946, 0, 6, 3, 5, 46, -4,
2, 0, 111, -4, 1, 0, 65, 0, "WHIP", 9, 1.2179, 0, 5, 3, 5, 46, -4,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "OBP", 12, 0.3576, 0, 2, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "HR", 11, 323, 0, 3, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "RBI", 11, 954, 0, 3, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "R", 12, 1011, 0, 2, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "SB", 6, 89, 0, 8, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "S", 2, 56, -1, 12, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "W", 7.5, 91, 1.5, 6, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "K", 10, 1508, 0.5, 4, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "ERA", 13, 3.688, 1, 1, 4, 5.5, 45.5, 2,
8, 13.5, 97.5, 2, 2, 13, 52, 0, "WHIP", 13, 1.1474, 0, 1, 4, 5.5, 45.5, 2
)
Upvotes: 1
Views: 1252
Reputation: 887223
An option is to loop through the 'dataBatting', 'dataPitching' column names, do the unnest_wider
separately, unnest
the columns of interest, and bind the rows together (map_dfr
- suffix 'dfr' returns dataframe with rows binded together from a list
of data.frames
of tibble
s). One thing that should be noted is that many of the tidyverse functions are type sensitive. Here, we find some list
elements to have different types and this would have an issue in unnest
unless the 'ptype' is mentioned. To avoid that, we can use type.convert
to change the type automatically based on the values and then do the unnest
ing
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
map_dfr(c('dataBatting', 'dataPitching'), ~
df %>%
unnest_wider(.x) %>%
mutate_at(vars(c(abbr, roto_points, value, diff, rank)),
type.convert) %>%
unnest(c(abbr, roto_points, value, diff, rank)) %>%
mutate_if(is.factor, as.character) %>%
select(-one_of(c("dataBatting", "dataPitching"))))
Upvotes: 3