Ash Levitt
Ash Levitt

Reputation: 153

Using dplyr to mutate factor variable conditional on numeric variables

I have data from a sports team tracking wins/losses versus other teams with the following structure:

Game TotalWins TotalLosses Team1Win Team1Loss Team2Win Team2Loss ...
1    1         0           1        NA        NA       NA
2    1         1           NA       NA        NA       1
3    2         1           NA       NA        1        NA
4    2         2           NA       1         NA       NA
5    3         2           NA       NA        1        NA
...

I want to create a factor variable that includes the team the game was played against so that the data looks like this:

Game TotalWins TotalLosses Team1Win Team1Loss Team2Win Team2Loss Team
1    1         0           1        NA        NA       NA        Team1
2    1         1           NA       NA        NA       1         Team2
3    2         1           NA       NA        1        NA        Team2
4    2         2           NA       1         NA       NA        Team1
5    3         2           NA       NA        1        NA        Team2
...

My thinking (NOT workable code) is essentially this:

if (Team1Win == 1 | Team1Loss == 1), Team = "Team1"
if (Team2Win == 1 | Team2Loss == 1), Team = "Team2"

I'm really struggling with how to do this in dplyr using mutate. I've tried various approaches with ifelse, recode, etc., but I either keep getting errors or results that are not what I want.

What is correct and most efficient way to make this work in dplyr?

Upvotes: 2

Views: 3968

Answers (5)

joel.wilson
joel.wilson

Reputation: 8413

this could be what you are looking for. ( not hardcoded for just 2 teams)

# solution 1 :
paste0("Team",ceiling(apply(df[-c(1:3)], 1, function(x) which(!is.na(x)))/2))
[1] "Team1" "Team2" "Team2" "Team1" "Team2"

# solution 2: using a apply() {basically a for loop itself]
apply(df[-c(1:3)], 1, function(x) gsub("(Team\\d)+.*", "\\1", colnames(df[-c(1:3)])[which(!is.na(x))]))
# [1] "Team1" "Team2" "Team2" "Team1" "Team2"

# solution 3: (long route to dplyr) [ you have indirectly taught me a lot in dplyr through my search for this solution]
func <- function(x){
  y = which(x == 1)             # get the location of where 1 appears
  z = rep(0, times = length(x)) # create a vector of 0's+location of 1
  z[y] = y                      # i.e. c(0,0,3,0,5) for Team2Win 
  z
}

df1 = df[-c(1:3)] %>% gather("key", "value", starts_with("Team")) %>% 
                group_by(key) %>%
                dplyr::mutate(x = func(value)) %>%
                filter(x != 0) %>% arrange(x) %>% select(key)

df$newcol = gsub("(Team\\d+).*", "\\1", df1$key)

Upvotes: 0

Jonathan Carroll
Jonathan Carroll

Reputation: 3947

Along similar lines to other answers, but with some useful changes:

  • na.rm = TRUE in gather
  • sub works just fine, no need for stringr
  • includes the full data as per the goal, using a full_join.
library(dplyr)
library(tidyr)

df = read.delim(text = 
"Game TotalWins TotalLosses Team1Win Team1Loss Team2Win Team2Loss
1 1 0 1 NA NA NA
2 1 1 NA NA NA 1
3 2 1 NA NA 1 NA
4 2 2 NA 1 NA NA
5 3 2 NA NA 1 NA", sep = " ")

df %>% 
  select(-starts_with("Total")) %>% 
  gather(Team, one, -Game, na.rm = TRUE) %>% 
  select(-one) %>% 
  mutate(Team = sub("Win|Loss", "", Team)) %>% 
  full_join(df, .)

#> Joining, by = "Game"
#>   Game TotalWins TotalLosses Team1Win Team1Loss Team2Win Team2Loss  Team
#> 1    1         1           0        1        NA       NA        NA Team1
#> 2    2         1           1       NA        NA       NA         1 Team2
#> 3    3         2           1       NA        NA        1        NA Team2
#> 4    4         2           2       NA         1       NA        NA Team1
#> 5    5         3           2       NA        NA        1        NA Team2

Upvotes: 2

jamieRowen
jamieRowen

Reputation: 1549

I'm a sucker for dplyr approaches to things at the moment so I offer a solution using dplyr that extends to however many teams you might have. It does also use tidyr and stringr as helpfully pointed out in the comment by apom below.

library(dplyr)
library(tidyr)
library(stringr)

df = read_delim(
"Game TotalWins TotalLosses Team1Win Team1Loss Team2Win Team2Loss
1 1 0 1 NA NA NA
2 1 1 NA NA NA 1
3 2 1 NA NA 1 NA
4 2 2 NA 1 NA NA
5 3 2 NA NA 1 NA",delim = " ")

df %>% 
  gather("Team",value,contains("Team")) %>% 
  filter(!is.na(value)) %>%
  mutate(Team = str_replace_all(Team,c("Win" = "","Loss" = ""))) %>%
  select(-value)

Upvotes: 1

Julius
Julius

Reputation: 287

I'm pretty sure that you have more than two teams in your data, and team names are not generic. What you want to do is to reshape the data to long form first and extract the relevant team name. So you may want to proceed as follows.

library(dplyr)
library(tidyr)
new_df <- df %>% 
  gather(team,idx,Team1Win:Team100Loss) %>%
  filter(!is.na(idx)) %>%
  select(-idx) %>%
  mutate(team = gsub("Win|Loss","",team))

If you want to keep those wide columns, then you can join the new DF to the old one.

Upvotes: 0

saurav shekhar
saurav shekhar

Reputation: 606

You can do it using a simple loop:

x = colnames(df)
df$team<- NA

for (i in 1:nrow(df))
{
 df$team[i] = x[which(df[i,]==1)]
}

And then at the end you can trim "win" and "loss" using following function:

df$team<- gsub("win", "",df$team)
df$team<- gsub("loss", "",df$team)

Upvotes: 0

Related Questions