Prajwal
Prajwal

Reputation: 13

Y axis values different from actual column in dataset in R

I am currently working with a dataset of "world bank islands". In that, I am trying to plot the population Vs country graph for each year. Below is the code that I have done.

library(ggplot2)
options(scipen = 999)
bank <- read.csv("C:/Users/True Gamer/OneDrive/Desktop/world_bank_international_arrivals_islands.csv")
bank[bank == "" | bank == "."] <- NA
bank$country <- as.numeric(bank$country)
bank$year <- as.numeric(bank$year)
bank$areakm2 <- as.numeric(bank$areakm2)
bank$pop <- as.numeric(bank$pop)
bank$gdpnom <- as.numeric(bank$gdpnom)
bank$flights...WB <- as.numeric(bank$flights...WB)
bank$hotels <- as.numeric(bank$hotels)
bank$hotrooms <- as.numeric(bank$hotrooms)
bank$receipt <- as.numeric(bank$receipt)
bank$ovnarriv <- as.numeric(bank$ovnarriv)
bank$dayvisit <- as.numeric(bank$dayvisit)
bank$arram <- as.numeric(bank$arram)
bank$arreur <- as.numeric(bank$arreur)
bank$arraus <- as.numeric(bank$arraus)
str(bank)
plot1 <- ggplot(bank, aes(x=country,y=pop)) + geom_bar(stat = "identity",aes(fill=year))  + ggtitle("Population of each country yearwise") + xlab("Countries") + ylab("Population")
plot1

However, when I do this, the y values shown on the graph are different from the actual y values. This is the link to the dataset

Upvotes: 0

Views: 245

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 174476

The problem is that you are stacking the bars (this is default behaviour). Also, geom_bar(stat = "identity") is just a long way of writing geom_col. One further point to note is that since all your columns are numeric, the single line:

bank <- as.data.frame(lapply(bank, as.numeric))

replaces all your individual numeric conversions.

The plot you are trying to create would be something like this:

 ggplot(bank, aes(x = country, y = pop)) + 
   geom_col(aes(fill = factor(year)), position = "dodge")  + 
   ggtitle("Population of each country yearwise") + 
   xlab("Countries") + 
   ylab("Population") +
   labs(fill = "Year") +
   scale_y_continuous(labels = scales::comma) +
   scale_x_continuous(breaks = 1:27)

enter image description here

However, it would probably be best to present your data in a different way. Perhaps, if you are comparing population growth, something like this would be better:

 ggplot(bank, aes(x = year, y = pop)) + 
   geom_line(aes(color = factor(country)), position = "dodge")  + 
   ggtitle("Population of each country yearwise") + 
   xlab("Year") + 
   ylab("Population") +
   facet_wrap(.~country, scales = "free_y", nrow = 6) +
   scale_y_continuous(labels = scales::comma) +
   scale_x_continuous(breaks = c(0, 5, 10)) +
   theme_minimal() +
   theme(legend.position = "none")

enter image description here

Or with bars:

 ggplot(bank, aes(x = year, y = pop)) + 
   geom_col(aes(fill = factor(country)), position = "dodge")  + 
   ggtitle("Population of each country yearwise") + 
   xlab("Year") + 
   ylab("Population") +
   facet_wrap(.~country, scales = "free_y", nrow = 6) +
   scale_y_continuous(labels = scales::comma) +
   scale_x_continuous(breaks = c(0, 5, 10)) +
   theme_minimal() +
   theme(legend.position = "none")

enter image description here

Upvotes: 2

Related Questions