Reputation: 21212
I read a csv into R and now I have a list of data.
head(data)
Date Open High Low Close Volume
1 31-Dec-14 223.09 225.68 222.25 222.41 2402097
2 30-Dec-14 223.99 225.65 221.40 222.23 2903242
3 29-Dec-14 226.90 227.91 224.02 225.71 2811828
4 26-Dec-14 221.51 228.50 221.50 227.82 3327016
5 24-Dec-14 219.77 222.50 219.25 222.26 1333518
6 23-Dec-14 223.81 224.32 219.52 220.97 4513321
tail(data)
Date Open High Low Close Volume
499 9-Jan-13 34.01 34.19 33.40 33.64 697979
500 8-Jan-13 34.50 34.50 33.11 33.68 1283985
501 7-Jan-13 34.80 34.80 33.90 34.34 441909
502 4-Jan-13 34.80 34.80 33.92 34.40 673993
503 3-Jan-13 35.18 35.45 34.75 34.77 741941
504 2-Jan-13 35.00 35.45 34.70 35.36 1194710
This is the stock price of a stock foreach day over a 2 year period from January 1st 2013 - December 31st 2014. For now I just want to be able to group by year, for any function or formula.
So, let's say I want: median(data$Close)
returns: 177.515
Is there a way to tell R to return these numbers for each of the two years as opposed to just all data?
e.g. combining R with a familiar SQL statement:
median(data$Close)
GROUP BY YEAR(Date);
I'm hoping to get something returned like:
2013 167.5
2014 175
Upvotes: 3
Views: 12485
Reputation: 3525
You could try (with the help of lubridate
package)
require(lubridate)
years <- year(as.Date(data$Date, "%d-%b-%y"))
tapply(data$Close, years, median)
Or you could use (with built-in R functions)
dates <- as.Date(data$Date, "%d-%b-%y")
years <- format(dates, "%Y")
tapply(data$Close, years, median)
Upvotes: 6