Reputation: 1118
I have data with fifty various categorical values in a column labelled "cat", and a second column with a continuous numerical value "amount". I only want to plot the subset of "cat" with an "amount" greater than 5. Why do I have the ghost-label on my x-axis for those intermediate rows that should be omitted based on my subset?
Example code:
cat<-c("a","b","c","d","e")
amount<-c(4,15,18,2,9)
df<-data.frame(cat=cat,amount=amount)
df1<-subset(df,amount >5)
library(plotly)
p <- plot_ly(df1, x = ~cat, y = ~amount)
p
df1 printed out:
cat amount
2 b 15
3 c 18
5 e 9
And the plot generated:
It is interesting that "a" doesn't appear on my x axis, but "d" does. I take it there is something going on with the row numbers, but why is this and how can I prevent this from happening?
Thank you in advance.
Upvotes: 0
Views: 155
Reputation: 13135
subset
does not drop the unused levels of a factor as shown below
str(df1)
'data.frame': 3 obs. of 2 variables:
$ cat : Factor w/ 5 levels "a","b","c","d",..: 2 3 5
$ amount: num 15 18 9
So stringsAsFactors = FALSE
will import cat
as a character vector which you can modify to factor after subsetting or use it directly.
df <- data.frame(cat=cat,amount=amount, stringsAsFactors = FALSE)
df1 <- subset(df,amount >5)
plot_ly(df1, x = ~cat, y = ~amount)
Upvotes: 2