user2543622
user2543622

Reputation: 6796

plot r two categorical variables

I am using below command to plot two categorical variables in R

gender has 2 levels and Income has 9 levels.

spineplot(main$Gender,main$Income, xlab="Gender", ylab="Income levels: 1 is lowest",xaxlabels=c("Male","Female"))

It produces chart like below enter image description here

  1. How can i plot this chart in color?
  2. How can i show % of each income level within each box? for example female income level 1 has 21% of data. How can i show 21% within the dark colored area?
################update 1

Adding reproducible example

fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1,
                 1, 1, 1, 2, 1, 1, 1, 1, 1,2,2,2,2),
               levels = c(1, 2), labels = c("male", "female"))
gender <- factor(rep(c(1:9),3))
spineplot(fail,gender)

Upvotes: 2

Views: 1589

Answers (2)

Marco Sandri
Marco Sandri

Reputation: 24272

An alternative to the interesting solution of @rawr is:

fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1,
                 1, 1, 1, 2, 1, 1, 1, 1, 1,2,2,2,2),
               levels = c(1, 2), labels = c("male", "female"))
gender <- factor(rep(c(1:9),3))

mypalette <- colorRampPalette(c("lightblue","darkblue"))
tbl <- spineplot(fail, gender, xlab="Gender", ylab="Income levels: 1 is lowest",
     xaxlabels=c("Male","Female"), col=mypalette(nlevels(gender)) )
print(tbl)

#        Income levels: 1 is lowest
# Gender   1 2 3 4 5 6 7 8 9
# male   2 1 2 1 3 2 2 2 1
# female 1 2 1 2 0 1 1 1 2

print.perc <- function(k, tbl, ndigits=2, str.pct="%") {
   # These lines of codes are the same used by from spineplot
   # for the calculation of the x-position of the stacked bars
   nx <- nrow(tbl)
   off <- 0.02
   xat <- c(0, cumsum(prop.table(margin.table(tbl, 1)) + off))
   posx <- (xat[1L:nx] + xat[2L:(nx + 1L)] - off)/2
   # Proportions by row (gender)       
   ptbl <- prop.table(tbl,1)
   # Define labels as strings with a given format
   lbl <- paste(format(round(100*ptbl[k,], ndigits), nsmall=ndigits), str.pct, sep="")
   # Print labels
   # cumsum(ptbl[k,])-ptbl[k,]/2 is the vector of y-positions
   # for the centers of each stacked bar
   text(posx[k], cumsum(ptbl[k,])-ptbl[k,]/2, lbl)
}

# Print income levels for males and females
strsPct <- c("%","%")
for (k in 1:nrow(tbl)) print.perc(k, tbl, ndigits=2, str.pct=strsPct[k])

enter image description here

Hope it can help you.

Upvotes: 2

rawr
rawr

Reputation: 20811

I think it may be easier to do this with a barplot since spineplot doesn't return anything useful.

The default would be the following, but you can adjust the widths of the bars to some other variable (you can see the x-axis coordinates are returned):

par(mfrow = 1:2)
(barplot(table(gender, fail)))
# [1] 0.7 1.9
(barplot(table(gender, fail), width = table(fail)))
# [1] 10.7 26.9

enter image description here

With some final touches we get

tbl <- table(gender, fail)
prp <- prop.table(tbl, 2L)
yat <- prp / 2 + apply(rbind(0, prp[-nrow(prp), ]), 2L, cumsum)

bp <- barplot(prp, width = table(fail), axes = FALSE, col = rainbow(nrow(prp)))

axis(2L, at = yat[, 1L], labels = levels(gender), lwd = 0)
axis(4L)

text(rep(bp, each = nrow(prp)), yat, sprintf('%0.f%%', prp * 100), col = 0)

enter image description here

Compare to

spineplot(fail, gender, col = rainbow(nlevels(gender)))

enter image description here

Upvotes: 4

Related Questions