Chelsea
Chelsea

Reputation: 335

R Grouped Bar Plots with Conditions

I am trying to compare two variables and create a grouped bar graph based on their correlations. The Churn column is either "Yes" or "No". The Contract column can be either "Month-to-Month", "One Year", or "Two Years". What I ultimately want is a grouped bar graph that has the total number of Yeses and Nos for each Contract type. Example being that the Month-to-Month contract type has 2220 Nos in the Churn column and 1655 Yeses.

I have to compare Churn to two other columns of similar nature, so at first I was trying to make a function that looped through the levels of each column, pulled the information, and dumped it into a vector but then started reading that appending to vectors in loops for R was not best practice.

So I went the long way about it with this:

contractLevels = levels(cd$Contract)
c1n = length(cd$Contract[which(cd$Churn == "No" & cd$Contract == contractLevels[1])])
c1y = length(cd$Contract[which(cd$Churn == "Yes" & cd$Contract == contractLevels[1])])
c2n = length(cd$Contract[which(cd$Churn == "No" & cd$Contract == contractLevels[2])])
c2y = length(cd$Contract[which(cd$Churn == "Yes" & cd$Contract == contractLevels[2])])
c3n = length(cd$Contract[which(cd$Churn == "No" & cd$Contract == contractLevels[3])])
c3y = length(cd$Contract[which(cd$Churn == "Yes" & cd$Contract == contractLevels[3])])
cv <- c(c1n, c1y, c2n, c2y, c3n, c3y)
cv
cn <- c(paste(contractLevels[1], "No"), paste(contractLevels[1], "Yes"), paste(contractLevels[2], "No"), paste(contractLevels[2], "Yes"), paste(contractLevels[3], "No"), paste(contractLevels[3], "Yes"))

I still wanted to make it as easy as possible to reuse so I didn't type out the actual new column names (cn). First of all, there has to be an easier way to do what is above and I'm just too much of an R noobie to figure it out. Secondly, I can't get it to be a grouped bar graph with this data. I was trying to follow this: http://www.r-graph-gallery.com/48-grouped-barplot-with-ggplot2/ but since the cn and cv vectors do not have 7032 "rows" (like my data does), it doesn't work.

Is it possible to say: Graph the total number of times each level of column X says "Yes" in column Y beside the total number of times it says "No" in column Y for each of these levels. I have been playing with rpart, plot, and ggplot trying to figure this out.

Just doing plot(cd$Contract, cd$Churn) gives me a stacked graph that is kinda what I want, except is kind of hard to read. Doing barplot(cv, ylab="Churn", names.arg=cn, cex.names=0.5, las=2) gives me the bar chart that isn't grouped and is also a bit hard to read. Stacked Graph

Barchart

Upvotes: 0

Views: 606

Answers (1)

Dror Bogin
Dror Bogin

Reputation: 443

I think the best course of action for you is to create a new vector with just the sums you want to display. Create another vector with the bars names in correct order and add the two to a data frame. Then use the grouped method from the source you provided. If you take the example from there then: Condition will become ("yes","no","yes","no","yes","no") Species will become contract type And value is the sum you want to display. This new data frame will work with the given example.

Upvotes: 1

Related Questions