Pake
Pake

Reputation: 1118

Plotly Histogram Retains Ghost Categorical X Values

I have data with fifty various categorical values in a column labelled "cat", and a second column with a continuous numerical value "amount". I only want to plot the subset of "cat" with an "amount" greater than 5. Why do I have the ghost-label on my x-axis for those intermediate rows that should be omitted based on my subset?

Example code:

cat<-c("a","b","c","d","e")
amount<-c(4,15,18,2,9)

df<-data.frame(cat=cat,amount=amount)

df1<-subset(df,amount >5)

library(plotly)

p <- plot_ly(df1, x = ~cat, y = ~amount) 
p

df1 printed out:

  cat  amount
2  b   15
3  c   18
5  e    9

And the plot generated: Erroneous Plot It is interesting that "a" doesn't appear on my x axis, but "d" does. I take it there is something going on with the row numbers, but why is this and how can I prevent this from happening?

Thank you in advance.

Upvotes: 0

Views: 155

Answers (1)

A. Suliman
A. Suliman

Reputation: 13135

subset does not drop the unused levels of a factor as shown below

str(df1)
 'data.frame':  3 obs. of  2 variables:
 $ cat   : Factor w/ 5 levels "a","b","c","d",..: 2 3 5
 $ amount: num  15 18 9

So stringsAsFactors = FALSE will import cat as a character vector which you can modify to factor after subsetting or use it directly.

df <- data.frame(cat=cat,amount=amount, stringsAsFactors = FALSE)
df1 <- subset(df,amount >5)
plot_ly(df1, x = ~cat, y = ~amount)

enter image description here

Upvotes: 2

Related Questions