marino89
marino89

Reputation: 909

R : High frequency data statistical analysis

I'm working with tick data and would like to have some basic information about the distribution of the change in tick prices. My database is made of tick data during a period of 10 open days. I've taken the first difference of the tick prices :

                     Tick spread
2010-02-02 08:00:04   -1
2010-02-02 08:00:04    1
2010-02-02 08:00:04    0
2010-02-02 08:00:04    0
2010-02-02 08:00:04    0
2010-02-02 08:00:04   -1
2010-02-02 08:00:05    1
2010-02-02 08:00:05    1

I've created an array which provides me with the first and last tick of each day :

       Open  Close
[1,]      1  59115
[2,]  59116 119303
[3,] 119304 207300
[4,] 207301 351379
[5,] 351380 426553
[6,] 426554 516742
[7,] 516743 594182
[8,] 594183 683840
[9,] 683841 754962
[10,] 754963 780725

I would like to know each day the empirical distribution of my tick spreads. I know that I can use the R function table() but the problem is that it gives me a table object which length varies with days. The second problem is that some day I can have spreads of 3 points whereas the days after I only have spreads less than 3 points.

first day table() output :

 -3    -2    -1     0     1     2     3 
  1    19  6262 46494  6321    16     2

second day table() output :

-2    -1     0     1     2     3     5 
27  5636 48902  5588    33     1     1

What I would like is to create a data frame with all table()'s output for my whole tick sample. Any idea? thanks

Upvotes: 2

Views: 731

Answers (2)

Joshua Ulrich
Joshua Ulrich

Reputation: 176648

Just use a 2-dimensional table, using as.Date(index(x)) as the rows:

# create some example data
set.seed(21)
p <- sort(runif(6))*(1:6)^2
p <- c(p,rev(p)[-1])
p <- p/sum(p)
P <- sample(-5:5, 1e5, TRUE, p)
x <- .xts(P, (1:1e5)*5)
# create table
table(as.Date(index(x)), x)
#             x
#                -5   -4   -3   -2   -1    0    1    2    3    4    5
#   1970-01-01   22  141  527 1623 2968 6647 2953 1700  538  139   21
#   1970-01-02   31  142  548 1596 2937 6757 2874 1677  529  167   22
#   1970-01-03   26  172  547 1599 2858 6814 2896 1681  504  163   20
#   1970-01-04   23  178  537 1645 2855 6805 2891 1626  537  165   18
#   1970-01-05   23  139  490 1597 3028 6740 2848 1724  505  158   28
#   1970-01-06   21  134  400 1304 2266 5496 2232 1213  397  112   26

Upvotes: 2

shoonya
shoonya

Reputation: 301

If you want the frequency distribution for the entire 10 day period just concatenate the data and do the same. Is that what you want to do?

Upvotes: 0

Related Questions