Reputation: 1315
I am not entirely sure what to name the problem I am having with the plotting function in R...
In my original dataset I had a variable called age with these levels: 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 2X, 30, 40, 50, 60. When I would plot age using:
plot(age, xlab="Age", ylab="Number of observations")
I would then get this plot (a bar plot with age on the x-axis and number of observations on the y-axis):
I then removed 2X (for people somewhere in their 20's) from the data and used the same code above to get the new plot. When I re-ran the code the plot now looks like this (a plot with age on the y-axis):
If anyone has any ideas about why the plot now has the age on the y-axis, please let me know! Thank you in advance for your help!
Upvotes: 1
Views: 392
Reputation: 73325
Diagnostic
You are getting into S3 methods dispatch issues. plot
is a generic function:
methods(plot)
# [1] plot.acf* plot.data.frame* plot.decomposed.ts*
# [4] plot.default plot.dendrogram* plot.density*
# [7] plot.ecdf plot.factor* plot.formula*
#[10] plot.function plot.hclust* plot.histogram*
#[13] plot.HoltWinters* plot.isoreg* plot.lm*
#[16] plot.medpolish* plot.mlm* plot.ppr*
#[19] plot.prcomp* plot.princomp* plot.profile.nls*
#[22] plot.raster* plot.spec* plot.stepfun
#[25] plot.stl* plot.table* plot.ts
#[28] plot.tskernel* plot.TukeyHSD*
Comments above asked you to provide str(age)
before and after removing 2X
, because such information helps tell which method has been dispatched when plot
is called.
When you have 2X
data, age
is definitely a factor. So when you call plot
, plot.factor
is invoked and a bar plot is produced.
While when you removed 2X
, it seems that age
somehow becomes a numerical variable. So when you call plot
, plot.default
is invoked and a scatter plot is produced, in which case plot(age)
is essentially doing plot.default(1:length(age), age)
.
Solution
One way that would definitely work is
plot(factor(age), xlab="Age", ylab="Number of observations")
However, I am still curious how you removed 2X
subset so that age
becomes numeric. Normally if age
is a factor variable in R, removing a subset does not change variable class.
Presumably age
is stored in a .txt
or .csv
file and you read it via scan()
, read.table()
or read.csv()
. When you remove 2X
, you removed them in those files and read data into R again. In this way, R will identify age
to be a different class at data read-in time.
Upvotes: 1