Reputation: 269
I've got a dataset of diving behavior from tagged animals, and I'm struggling to fit a curve to the data, I think mainly because the X variable in this case is categorical, and not continuous data. Let me give a bit of background:
My dataset has 184 observations of 14 variables:
tagID ddmmyy Hour.GMT.Hour.Local. X0 X3 X10 X20 X50 X100 X150 X200 X300 X400
1 122097 250912 0 9 0.0 0.0 0.3 12.0 15.3 59.6 12.8 0.0 0 0
2 122097 260912 0 9 0.0 2.4 6.9 5.5 13.7 66.5 5.0 0.0 0 0
3 122097 260912 6 15 0.0 1.9 3.6 4.1 12.7 39.3 34.6 3.8 0 0
4 122097 260912 12 21 0.0 0.2 5.5 8.0 18.1 61.4 6.7 0.0 0 0
5 122097 280912 6 15 2.4 9.3 6.0 3.4 7.6 21.1 50.3 0.0 0 0
6 122097 290912 18 3 0.0 0.2 1.6 6.4 41.4 50.4 0.0 0.0 0 0
The variables I'm interested in are X0:X400
. These are depth bins, and the values represent the percent of the total time for that period of the day that the animal spent in that depth bin. So on the first line, it spent 0% of its time between 0-3meters, 59.6% of its time between 100-150 meters, etc. With a bit of help from some answers to my last question here on stackoverflow, I calculated the mean % time spent in each depth bin by this animal:
diving.means <- colMeans(diving[, -(1:4)])
lowerIntervalBound <- gsub("X", "", names(diving)[-(1:4)])
lowInts <- as.numeric(lowerIntervalBound)
plot(x=factor(lowInts), y=diving.means, xlab="Depth Bin (Meters—Lower Bound)", ylab="% Time Spent")
which provided me with this plot:
Unfortunately because my data are means (a single value), and not frequencies, I couldn't figure out how to plot them as a histogram... That's neither here nor there, as I can easily just input these as values and make the desired plot if necessary.. but this does the trick analytically for now.
Now I've got multiple animals and different time bins that I'd like to compare. I'll eventually work out a system to weight the time spent in bins to get an average depth to compare statistically, but for now I just want to compare them visually, qualitatively, as well as produce plots that I can use in presentations and eventually publications. What I'd like to do is create a density curve representing my 'histogram,' and then plot those curves from multiple scenarios on a single plot to compare. However, I can't seem to make this work with the density()
function, as I don't have frequency data. I sort of have densities calculated already, as % time spent in each bin.. but they're not represented in raw format in my dataset as frequencies of categories, which I can then make histograms and density curves out of.
This is how my data look:
> diving.means
X0 X3 X10 X20 X50 X100 X300 X400 X150 X200
3.330978261 3.299456522 8.857608696 17.646195652 30.261413043 29.356521739 6.445108696 0.664130435 0.135869565 0.001630435
or:
> df<-data.frame(lowInts, diving.means)
> df
lowInts diving.means
X0 0 3.330978261
X3 3 3.299456522
X10 10 8.857608696
X20 20 17.646195652
X50 50 30.261413043
X100 100 29.356521739
X150 150 6.445108696
X200 200 0.664130435
X300 300 0.135869565
X400 400 0.001630435
And what I would like to produce is something that looks more or less like this (pulled it randomly from a publication—axes are unrelated to my data):
and then be able to isolate the curves and plot them together.
Thanks for any help you can provide!
Upvotes: 3
Views: 1260
Reputation: 7130
You already have frequencies, so hist
cannot be used. You can use plot
with spline interpolation for density:
df <- read.table(text=" lowInts diving.means
X0 0 3.330978261
X3 3 3.299456522
X10 10 8.857608696
X20 20 17.646195652
X50 50 30.261413043
X100 100 29.356521739
X150 150 6.445108696
X200 200 0.664130435
X300 300 0.135869565
X400 400 0.001630435")
require(splines)
dens <-predict(interpSpline(df[,1], df[,2]))
plot(df[,1], df[,2], type="s", ylim=c(0,40))
lines(dens, col="red",lwd=2)
Upvotes: 1
Reputation: 115435
I think a step function is what you want.
You could use stepfun
to create this function.
I would work in long format, and then you could create step functions for the median or mean
# assuming your data is called `diving`
library(data.table)
DTlong <- reshape(data.table(diving), varying = list(5:14), direction = 'long',
times = c(0,3,10,20,50,100,150,200,300,400),
v.names = 'time.spent', timevar = 'hours')
DTsummary <- DTlong[,c(mean.d = mean(time.spent),
setattr(as.list(fivenum(time.spent)), 'names',c('min','lhinge','median','uhinge','max'))),
by=list(hours, midhours, upperhours)]
f.median <- DTsummary[, stepfun(hours, c(0,median))]
f.uhinge <- DTsummary[, stepfun(hours, c(0,uhinge))]
f.lhinge <- DTsummary[, stepfun(hours, c(0,lhinge))]
plot(f.median, main = 'median time spent', xlim = c(0,500), do.points = FALSE)
ggplot(DTsummary, aes(x = hours)) + geom_step(aes(y = median))
Upvotes: 1