Reputation: 153
I have data from a sports team tracking wins/losses versus other teams with the following structure:
Game TotalWins TotalLosses Team1Win Team1Loss Team2Win Team2Loss ...
1 1 0 1 NA NA NA
2 1 1 NA NA NA 1
3 2 1 NA NA 1 NA
4 2 2 NA 1 NA NA
5 3 2 NA NA 1 NA
...
I want to create a factor variable that includes the team the game was played against so that the data looks like this:
Game TotalWins TotalLosses Team1Win Team1Loss Team2Win Team2Loss Team
1 1 0 1 NA NA NA Team1
2 1 1 NA NA NA 1 Team2
3 2 1 NA NA 1 NA Team2
4 2 2 NA 1 NA NA Team1
5 3 2 NA NA 1 NA Team2
...
My thinking (NOT workable code) is essentially this:
if (Team1Win == 1 | Team1Loss == 1), Team = "Team1"
if (Team2Win == 1 | Team2Loss == 1), Team = "Team2"
I'm really struggling with how to do this in dplyr using mutate. I've tried various approaches with ifelse, recode, etc., but I either keep getting errors or results that are not what I want.
What is correct and most efficient way to make this work in dplyr?
Upvotes: 2
Views: 3968
Reputation: 8413
this could be what you are looking for. ( not hardcoded for just 2 teams)
# solution 1 :
paste0("Team",ceiling(apply(df[-c(1:3)], 1, function(x) which(!is.na(x)))/2))
[1] "Team1" "Team2" "Team2" "Team1" "Team2"
# solution 2: using a apply() {basically a for loop itself]
apply(df[-c(1:3)], 1, function(x) gsub("(Team\\d)+.*", "\\1", colnames(df[-c(1:3)])[which(!is.na(x))]))
# [1] "Team1" "Team2" "Team2" "Team1" "Team2"
# solution 3: (long route to dplyr) [ you have indirectly taught me a lot in dplyr through my search for this solution]
func <- function(x){
y = which(x == 1) # get the location of where 1 appears
z = rep(0, times = length(x)) # create a vector of 0's+location of 1
z[y] = y # i.e. c(0,0,3,0,5) for Team2Win
z
}
df1 = df[-c(1:3)] %>% gather("key", "value", starts_with("Team")) %>%
group_by(key) %>%
dplyr::mutate(x = func(value)) %>%
filter(x != 0) %>% arrange(x) %>% select(key)
df$newcol = gsub("(Team\\d+).*", "\\1", df1$key)
Upvotes: 0
Reputation: 3947
Along similar lines to other answers, but with some useful changes:
na.rm = TRUE
in gather
sub
works just fine, no need for stringr
full_join
.library(dplyr)
library(tidyr)
df = read.delim(text =
"Game TotalWins TotalLosses Team1Win Team1Loss Team2Win Team2Loss
1 1 0 1 NA NA NA
2 1 1 NA NA NA 1
3 2 1 NA NA 1 NA
4 2 2 NA 1 NA NA
5 3 2 NA NA 1 NA", sep = " ")
df %>%
select(-starts_with("Total")) %>%
gather(Team, one, -Game, na.rm = TRUE) %>%
select(-one) %>%
mutate(Team = sub("Win|Loss", "", Team)) %>%
full_join(df, .)
#> Joining, by = "Game"
#> Game TotalWins TotalLosses Team1Win Team1Loss Team2Win Team2Loss Team
#> 1 1 1 0 1 NA NA NA Team1
#> 2 2 1 1 NA NA NA 1 Team2
#> 3 3 2 1 NA NA 1 NA Team2
#> 4 4 2 2 NA 1 NA NA Team1
#> 5 5 3 2 NA NA 1 NA Team2
Upvotes: 2
Reputation: 1549
I'm a sucker for dplyr
approaches to things at the moment so I offer a solution using dplyr
that extends to however many teams you might have. It does also use tidyr
and stringr
as helpfully pointed out in the comment by apom below.
library(dplyr)
library(tidyr)
library(stringr)
df = read_delim(
"Game TotalWins TotalLosses Team1Win Team1Loss Team2Win Team2Loss
1 1 0 1 NA NA NA
2 1 1 NA NA NA 1
3 2 1 NA NA 1 NA
4 2 2 NA 1 NA NA
5 3 2 NA NA 1 NA",delim = " ")
df %>%
gather("Team",value,contains("Team")) %>%
filter(!is.na(value)) %>%
mutate(Team = str_replace_all(Team,c("Win" = "","Loss" = ""))) %>%
select(-value)
Upvotes: 1
Reputation: 287
I'm pretty sure that you have more than two teams in your data, and team names are not generic. What you want to do is to reshape the data to long form first and extract the relevant team name. So you may want to proceed as follows.
library(dplyr)
library(tidyr)
new_df <- df %>%
gather(team,idx,Team1Win:Team100Loss) %>%
filter(!is.na(idx)) %>%
select(-idx) %>%
mutate(team = gsub("Win|Loss","",team))
If you want to keep those wide columns, then you can join the new DF to the old one.
Upvotes: 0
Reputation: 606
You can do it using a simple loop:
x = colnames(df)
df$team<- NA
for (i in 1:nrow(df))
{
df$team[i] = x[which(df[i,]==1)]
}
And then at the end you can trim "win" and "loss" using following function:
df$team<- gsub("win", "",df$team)
df$team<- gsub("loss", "",df$team)
Upvotes: 0