Creating stacked chart

Question

I have two tables that stores login attempts of users. One table contains all successful logins and the other contains fail attempts. I'm trying to create a stacked chart by using fail login counts and successful login counts. This is how my tables look like :

Success_login Table:

User_ID  Site_Address  Login_Attempts
1        xxx.xxx.xxx   5
2        xxx.xxy.yyy   10

Fail_login Table:

User_ID  Site_Address  Login_Attempts
1        xxx.xxx.xxx   2
2        xxx.xxy.yyy   8

How do I use Login_Attempts columns of those two tables to create stacked chart so that I can highlight success and failure attempt? I looked online and I found this code :

# Stacked Bar Plot with Colors and Legend
 counts <- table(mtcars$vs, mtcars$gear)
 barplot(counts, main="Car Distribution by Gears and VS",
 xlab="Number of Gears", col=c("darkblue","red"),
 legend = rownames(counts))

However, it does not work, as my two tables have different number of records. I appreciate if you could guide me to the solution.

Thanks

bgoldst · Accepted Answer

Discussion

First you have to unify your data into a single table. This can be done with a kind of outer join, if you're familiar with SQL. See How to join (merge) data frames (inner, outer, left, right)?. The resulting NAs (for records which failed to join to the opposite table) must be replaced with zeroes in order for the final call to barplot() to work.

You must then derive a matrix in the format required by barplot() for producing stacked bar charts, which can be done pretty easily with a single call to matrix(). Taking care to set labels/titles/legends/colors correctly, you can get a nice stacked bar chart:

Code

s <- data.frame(User_ID=c(1,2,3), Site_Address=c('xxx.xxx.xxx','xxx.xxy.yyy','xxx.yyy.zzz'), Login_Attempts=c(5,10,3) );
f <- data.frame(User_ID=c(1,2,4), Site_Address=c('xxx.xxx.xxx','xxx.xxy.yyy','xxx.yyy.zzz'), Login_Attempts=c(2,8,4) );
all <- merge(s,f,by=c('User_ID','Site_Address'),suffixes=c('.successful','.failed'),all=T);
all[is.na(all)] <- 0;
stackData <- matrix(c(all$Login_Attempts.failed, all$Login_Attempts.successful ),2,byrow=T);
colnames(stackData) <- paste0(all$User_ID, '@', all$Site_Address );
rownames(stackData) <- c('failed','successful');
barplot(stackData,main='Successful and failed login attempts',xlab='User_ID@Site_Address',ylab='Login_Attempts',col=c('red','blue'),legend=rownames(stackData));

Resulting data

r> s;
  User_ID Site_Address Login_Attempts
1       1  xxx.xxx.xxx              5
2       2  xxx.xxy.yyy             10
3       3  xxx.yyy.zzz              3
r> f;
  User_ID Site_Address Login_Attempts
1       1  xxx.xxx.xxx              2
2       2  xxx.xxy.yyy              8
3       4  xxx.yyy.zzz              4
r> all;
  User_ID Site_Address Login_Attempts.successful Login_Attempts.failed
1       1  xxx.xxx.xxx                         5                     2
2       2  xxx.xxy.yyy                        10                     8
3       3  xxx.yyy.zzz                         3                     0
4       4  xxx.yyy.zzz                         0                     4
r> stackData;
           1@xxx.xxx.xxx 2@xxx.xxy.yyy 3@xxx.yyy.zzz 4@xxx.yyy.zzz
failed                 2             8             0             4
successful             5            10             3             0

Output

bar-chart

References

Edit: It's a little strange to create a one-bar stacked bar chart, but ok, here's how you can do it, using the above data (all) as a base:

barplot(matrix(c(sum(all$Login_Attempts.failed),sum(all$Login_Attempts.successful))),main='Successful and failed login attempts',ylab='Login_Attempts',col=c('red','blue'),legend=c('failed','successful'));

one-bar-chart

Edit: Yeah, the y-axis should really cover the stack completely by default, it's a weakness in the base graphics package that it doesn't. You can add ylim=c(0,1.2*sum(do.call(c,all[,3:4]))) as an argument to the barplot() call to force the y-axis to extend at least 20% beyond the high point of the stack. (It's unfortunate that you have to calculate that manually from the input data, but as I said, it's a weakness in the package.)

Also, with regard to my comment about the oneness of the bar, it's just more common for stacked bar charts to be used to compare multiple bars, rather than showing a single bar. (That's why my initial assumption was that you wanted a separate bar for each user/site.) Instead of a single stacked bar, normally you'd see a plain old bar chart showing the different data points side-by-side. But it really depends on your application, so do what works best for you.