Reputation: 345
I am working to extend the following example based on the question Using a dataframe to input values to a function and aggregating the outputs
I've added a third team (Mary + Bill), but am now having some problems because the "team3" contains a player that doesn't have any data, so the output is just "Mary"
The other issue I have is that the output of "team1" moves from "Mary + Frank" in the input in "teams" to "Frank + Mary" in the output (that is, the order is around the other way).
This makes it difficult to reconcile the output back to the teams.
I can make team1 and team2 work if I use regex and the start and finish of the teams$V2, but that doesn't help with missing data (ie. Bill).
Modified input:
input = structure(list(V1 = c("Team_2022", "Team_2022", "Team_2022"), V2 = c("Frank", "Mary", "John"), V3 = c("Sydney", "Sydney", "Sydney"), V4 = c(55, 76, 14)), row.names = c(NA, -3L), class = c("data.table", "data.frame"))
teams = structure(list(V1 = c("team1", "team2", "team3"), V2 = c("Mary + Frank","Mary + John", "Mary + Bill")), class = "data.frame", row.names = c(NA, -3L))
result = function(data, combination)
{
purrr::map_df(lapply(combination,function(x){paste(x,collapse = '|')}), function(x){
df = data[grepl(x,data$V2),] %>% group_by(V3) %>% summarize(V1= paste(V1[1]), V2= paste(V2,collapse = ' + '), V4= sum(V4))
data = df[,c(2,3,1,4)]
return(data)
}
)
}
apply(teams, 1, function(x) {
y <- list(str_split(x[2], ' \\+ ', simplify = T))
result(input, y)
}) %>% do.call("rbind", .)
Output
structure(list(V1 = c("Team_2022", "Team_2022", "Team_2022"),
V2 = c("Frank + Mary", "Mary + John", "Mary"), V3 = c("Sydney",
"Sydney", "Sydney"), V4 = c(131, 90, 76)), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))
Can anyone suggest how I can get to my desired output - either keeps the team1, team2 in the same order so I can join them back afterwards; or joins the team name as part of the input?
output1 = structure(list(V1 = c("Team_2022", "Team_2022", "Team_2022"),
V2 = c("Mary + Frank","Mary + John", "Mary + Bill"), V3 = c("Sydney",
"Sydney", "Sydney"), V4 = c(131, 90, 76)), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))
output2 = structure(list(V1 = c("team1", "team2", "team3"), V2 = c("Team_2022", "Team_2022", "Team_2022"),
V3 = c("Frank + Mary", "John + Mary","Mary + Bill"), V4 = c("Sydney",
"Sydney", "Sydney"), V5 = c(131, 90, 76)), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))
Or even (because I could at least tie the team3 back to missing values):
output3 = structure(list(V1 = c("team1", "team2", "team3"), V2 = c("Team_2022", "Team_2022", "Team_2022"),
V3 = c("Frank + Mary", "John + Mary","Mary"), V4 = c("Sydney", "Sydney", "Sydney"), V5 = c(69, 90, 14)), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 0
Views: 31
Reputation: 66415
library(dplyr); library(tidyr)
teams %>%
separate_rows(V2) %>%
left_join(input, by = c("V2" = "V2")) %>%
replace_na(list(V4 = 0)) %>%
group_by(V1.x) %>% fill(V1.y, V3) %>%
summarize(V2 = paste(V2, collapse = " + "),
V3 = first(V3),
V4 = sum(V4))
Result
# A tibble: 3 × 4
V1.x V2 V3 V4
<chr> <chr> <chr> <dbl>
1 team1 Mary + Frank Sydney 131
2 team2 Mary + John Sydney 90
3 team3 Mary + Bill Sydney 76
Upvotes: 1