Sri Sreshtan
Sri Sreshtan

Reputation: 545

How to categorize the data and plot a graph

https://www.kaggle.com/nowke9/ipldata ---- contains the data set.

This is an Exploratory data analysis performed on two IPL data sets. I am looking to establish relationship between toss won by the team and winner of the match from the matches data set. When I split the data into winner and loser using the ifelse condition and plot the graph, I am getting the output as a single bar graph containing only the total number of matches and all of it is part of the lost legend.

Here is the code -

library(tidyverse)

deliveries_tbl <- read.csv("data/deliveries_updated.csv")
matches_tbl <- read.csv("data/matches_updated.csv")

matches_normal_result_tbl <- matches_tbl[matches_tbl$result == "normal",]

# Is winning toss really an adnavtage ? ----
matches_normal_result_tbl$toss_match <- ifelse(as.character(matches_normal_result_tbl$toss_winner)== 
                                                    as.character(matches_normal_result_tbl$winner), 
                                                    "Won", "Lost")

ggplot(matches_normal_result_tbl[which(!is.na(matches_normal_result_tbl$toss_match)),], aes(toss_match, fill = toss_match))+
    geom_bar()+
    xlab("Toss")+ ylab("Number of matches won")+
    ggtitle("How much of advantage is winning the toss ?")

The output is as follows :-

Is winning the toss an advantage ?

How to split the data into two columns of winner and loser and get two bar graphs ? Many thanks in advance.

Upvotes: 2

Views: 122

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389175

To calculate number of matches won based on toss, you can do :

library(dplyr)
library(ggplot2)

matches %>%
  mutate(toss_match = ifelse(toss_winner == winner, "Won", "Loss")) %>%
  count(toss_match) %>%
  ggplot() + aes(toss_match, n, fill = toss_match) + 
  geom_col() + 
  xlab("Toss")+ ylab("Number of matches won")+
  ggtitle("How much of advantage is winning the toss ?")

enter image description here

You can go further and do the same analysis for top cities where the matches were played.

matches %>%
  mutate(toss_match = ifelse(toss_winner == winner, "Won", "Loss")) %>%
  count(city, toss_match) %>%
  group_by(city) %>%
  filter(all(n > 10)) %>%
  mutate(n = n/sum(n) * 100) %>%
  ggplot() + aes(city, n, fill = toss_match) + 
  geom_col() + 
  xlab("City")+ ylab("Percentage") + 
  ggtitle("Advantage of winning toss in each city")

enter image description here

Upvotes: 2

Related Questions