xyzzyrz
xyzzyrz

Reputation: 16456

Backtesting or custom splits in caret trainControl?

Any way to make train() run with custom train/test partitions of the data? I'm interested in backtesting time series data (for when traditional resampling/CV/etc. would be inappropriate/leaky). I.e. if the data is ordered in time from 1...N, then I repeatedly train on data before a certain cutoff to predict on data following the cutoff (up to a certain sliding window size). I couldn't determine how to pull this off while leveraging the rest of caret train(). Thanks in advance for any tips.

Upvotes: 3

Views: 529

Answers (1)

topepo
topepo

Reputation: 14331

Max here.

You can specify custom resampling indices in trainControl(index = list()) where the list has the elements of the training data that are used for training.

...but train() will use everything else as a hold-out and I don't think that's what you want.

I've probably had about 10 different requests for this feature. It would take some modifications to train() to do it, but it shouldn't be too bad.

However, 1) I don't know jack about time series analysis (beyond simple basics) so some prototype code with one or two testing examples would be helpful and 2) until I finish the book (about 4 months) I won't really have time to do this.

So, it can be done with some modification if you are willing to contribute some technical bits and can wait a few months (which can be reduced depending on how proactive you would like to be).

Send me an email to the address listed on the package if you would like to discuss further.

Upvotes: 5

Related Questions