Reputation: 1047
I was busy creating graphs to compare data and was working on a boxplot in this case. I have IMDb data, and also 100k Movielens data (from here: http://grouplens.org/datasets/movielens/)
For IMDb it was rather easy to create these boxplots, the dataframe looked like this:
For MovieLens however, the genres looks like this:
How would I create a boxplot when there are multiple genres in this? Best case is to combine it into the IMDb boxplot that I have already, which looks like this:
Currently, the code for the IMDb one is like this:
all_movies$Rating <- sapply(sapply(all_movies$Rating, as.character), as.numeric)
output$boxplot <- renderPlot({
p <- ggplot(all_movies) + geom_boxplot(aes(x = Genre, y = Rating))
p
})
How would this work for Movielens to create something similar?
Upvotes: 0
Views: 296
Reputation: 1266
Gregor already suggested what I also think is the best solution:
# example df
lens=data.frame(movie=c('A','B'),genre=c('Adventure|Animation','Comedy|Animation'),rating=8:9)
# create new columns
genres=unique(unlist(strsplit(as.character(lens$genre),"\\|")))
for(i in genres){
lens$newcol=grepl(i,lens$genre)
colnames(lens)[ncol(lens)]=i
}
lens$genre=NULL
# melt for ggplot
lens=melt(lens,id=c('movie','rating'))
lens=lens[lens$value==TRUE,]
ggplot(lens,aes(x=variable,y=rating)) + geom_boxplot()
If you want both movie databases to be on the same plot, you simply create the same structure for AMDB, add to both df a column with the name (ADMB$source="ADMB", lens$source="movielens") and rbind them (df=rbind(ADMB,movielens).
The plot would be:
ggplot(df,aes(x=variable,y=rating,col=source)) + geom_boxplot()
Upvotes: 1