mtfurlan
mtfurlan

Reputation: 1062

R make new data frame from current one

I'm trying to calculate the best goal differentials in the group stage of the 2014 world cup.

football <- read.csv(
    file="http://pastebin.com/raw.php?i=iTXdPvGf", 
    header = TRUE, 
    strip.white = TRUE
)
football <- head(football,n=48L)
football[which(max(abs(football$home_score - football$away_score)) == abs(football$home_score - football$away_score)),]

Results in

home home_continent home_score        away away_continent away_score result
4  Cameroon         Africa          0     Croatia         Europe          4      l
7     Spain         Europe          1 Netherlands         Europe          5      l
37  Germany  

So those are the games with the highest goal differntial, but now I need to make a new data frame that has a team name, and abs(football$home_score-football$away_score)

Upvotes: 0

Views: 111

Answers (3)

akrun
akrun

Reputation: 887048

Here is another option without using ifelse for creating the "winner" column. This is based on row/column indexes. The numeric column index is created by matching the result column with its unique elements (match(football$result,..), and the row index is just 1:nrow(football). Subset the "football" dataset with columns 'home', 'away' and cbind it with an additional column 'draw' with NAs so that the 'd' elements in "result" change to NA.

football$score_diff <- abs(football$home_score - football$away_score)
football$winner <-  cbind(football[c('home', 'away')],draw=NA)[ 
    cbind(1:nrow(football), match(football$result, c('w', 'l', 'd')))]

football[with(football, score_diff==max(score_diff)),]
#  home home_continent home_score    away away_continent away_score   result
 #60 Brazil  South America          1 Germany         Europe          7    l
 #   score_diff  winner
 #60          6 Germany

If the dataset is very big, you could speed up the match by using chmatch from library(data.table)

library(data.table)
chmatch(as.character(football$result), c('w', 'l', 'd'))

NOTE: I used the full dataset in the link

Upvotes: 1

jazzurro
jazzurro

Reputation: 23574

You could save some typing in this way. You first get score differences and winners. When the result indicates w, home is the winner. So you do not have to look into scores at all. Once you add the score difference and winner, you can subset your data by subsetting data with max().

mydf <- read.csv(file="http://pastebin.com/raw.php?i=iTXdPvGf", 
                 header = TRUE, strip.white = TRUE)
mydf <- head(mydf,n = 48L)

library(dplyr)

mutate(mydf, scorediff = abs(home_score - away_score),
             winner = ifelse(result == "w", as.character(home),
                         ifelse(result == "l", as.character(away), "draw"))) %>%
filter(scorediff == max(scorediff))

#      home home_continent home_score        away away_continent away_score result scorediff      winner
#1 Cameroon         Africa          0     Croatia         Europe          4      l         4     Croatia
#2    Spain         Europe          1 Netherlands         Europe          5      l         4 Netherlands
#3  Germany         Europe          4    Portugal         Europe          0      w         4     Germany

Upvotes: 1

Mahrtynas
Mahrtynas

Reputation: 106

 football$score_diff <- abs(football$home_score - football$away_score)
 football$winner <- ifelse(football$home_score > football$away_score, as.character(football$home), 
                      ifelse(football$result == "d", NA, as.character(football$away)))

Upvotes: 2

Related Questions