Saurabh Sharma
Saurabh Sharma

Reputation: 27

Trying to understand cdplot in R

Hi I have an academic data set where if number of raised hands increases they will get higher marks.

Marks are stored in Class1 column where H represents higher marks and L represents lower marks.

i got following plot through cdplot in R but by my understanding it looks like if number of raised hands increases they will get lower marks, which is wrong but i am not able to understand the output correctly.

Please help me in understanding what the output plot is saying.

used following code-

getwd()
Reading.df <- read.csv("xAPI-Edu-Data.csv")
cdplot(Class1 ~ raisedhands,data =  Reading.df)

and got below output-

enter image description here

Upvotes: 1

Views: 330

Answers (1)

StupidWolf
StupidWolf

Reputation: 46898

It's the other way around, the dark band represents class H and as you go towards higher number of raised hands, the y-axis is dominated by the dark band, indicating more class H. Another way to this about this plot, is like if you split you x-axis variable into categories, and ask the proportion of classes in each category, as you increase

For example, we use the iris dataset, and has two classes, setosa and others. We divide the continuous Sepal.Width variable into ordinal (5 bins) and see the distribution of the species:

data = iris
data$Species = factor(ifelse(data$Species=="setosa","setosa","others"))
tab = table(data$Species,cut(data$Sepal.Width,5))
barplot(sweep(tab,2,colSums(tab),"/"),
xlab="Sepal.Width ranges",ylab="Compostion of species",
col = c("lightblue","darkblue"))
legend("topright",fill=c("lightblue","darkblue"),rownames(tab),
xpd=TRUE, horiz=TRUE,inset=c(0,-0.3))

enter image description here

Higher values of Sepal.width is dominated by more "setosa" species. Now we do cdplot:

enter image description here

Upvotes: 1

Related Questions