Reputation: 196
I have a large dataset (800,000+ data points) with information about loans given by 5,000+ banks. I am trying to compare the number of loans disbursed by the top N banks that disburse the most loans, with the rest of the banks together. For that, I made the dataframe banks
, which is sorted by number of loans disbursed in descending order. I also added a column with the relative cumulative sum of loans disbursed. I was able to make a plot of this, but I am trying to make a histogram where the x axis is the N a number from 1 to 10, and the y axis is the percentage of loans disbursed by the top N banks. Each bar, will be sectioned into different colors. For example, the first bar would be one color and include the cumulative values of the first bank only, the second bar would be the cumulative sum of the top 2 banks, and would have two colors: one for each bank, starting from the top bank.
As a concrete example, let's say I have a set of 100 loans, where the top 5 banks disbursed 20, 14, 12, 12, 10 loans each.
Then the plot should be as follows for N going from 1 to 5:
And, if possible, it would have the legends that say which bank corresponds to each color.
I tried using ggplot
but it does not let me define the axes the specific way I want them.
I think this is not that hard, but I am a complete neophyte at using R, so I made this histogram using Excel and paint. Thank you so much!
I made the following test data frame as per @sindri_baldur 's suggestion for the example plot using dput()
:
structure(list(Bank.Name = structure(1:16, .Label = c("A", "B",
"C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O",
"P"), class = "factor"), Loans = c(20, 14, 12, 12, 10, 4, 3,
3, 3, 3, 3, 3, 3, 3, 3, 1)), class = "data.frame", row.names = c(NA,
-16L))
Upvotes: 0
Views: 561
Reputation: 1595
Try following code.
Your data called bnk
here.
library(dplyr)
N <- 5
# create empty tibble
top_b <- tibble(topn=0, Bank.Name = '', Loans = 0) %>%
filter(topn>0)
for (i in 1:N) {
top_b <- top_b %>%
bind_rows( bind_cols(topn = rep(i, i), head(bnk , i)))
}
# factor with opposite direction needed for graph you want
top_b$Bank.Name <- factor(top_b$Bank.Name,
levels = unique(top_b$Bank.Name)[N:1])
top_b %>%
ggplot(aes(x=topn, y=Loans, fill = Bank.Name))+
geom_bar(stat = 'identity')
Upvotes: 0