Reputation: 3906
I have daily weather data with columns for the day of the month, the month, the year, and the data. But I need to add another column for the day of the year. e.g 1 - 365 (or 366 for leap years).
I'm not much of a programmer at all, I am familiar with seq()
e.g. seq(1, 365)
But the above would terminate at 365. I need to sequentially increase the number while accounting for the year, so that the sequence starts over every year (and accounts for leap years). In this example, all weather data begin on Jan. 1st.
Any ideas/suggestion/pointers much appreciated.
Edit: Example data
example.data <- structure(list(V1 = 1:6, V2 = c(1L, 1L, 1L, 1L, 1L, 1L),
V3 = c(1950L, 1950L, 1950L, 1950L, 1950L, 1950L),
V4 = c(NA, NA, NA, NA, NA, NA),
V5 = c(0, 0, 0, 0, 0, 0)),
.Names = c("V1", "V2", "V3", "V4", "V5"), row.names = c(NA, 6L), class = "data.frame")`
Upvotes: 4
Views: 1031
Reputation: 460
Assuming your dataset is named df
, you could construct a date field:
df$date <- as.Date(paste(df$Y, df$m, df$d, sep="-"), "%Y-%m-%d")
And then use the get the %j
attribute from that date object:
df$day_of_year <- as.numeric(strftime(df$date, "%j"))
Upvotes: 5
Reputation: 24945
Try this code, assuming your "year" column is named "V3":
Edit: More seriously, pasting a picture of your data is a bad idea, see here for how to include your data to make it easier for people to help. Including dput(head(data))
is almost always best.
For your problem, read in your data:
z <- read.csv("test.data.txt", sep="\t", header = FALSE)
Then use dplyr to seq_along()
each year:
library(dplyr)
mydat <- z %>% group_by(V3) %>%
mutate(day = seq_along(V3))
We can verify we got some 366s:
sum(mydat$day == 366)
sum(mydat$day == 365)
Upvotes: 4
Reputation: 37794
R has a Date
class, which is a good first step; you can get that by pasting your columns into "Y-M-D" format and then calling as.Date
. But there's an even better option, which is the POSIXlt
class, which contains exactly the information you want in the yday
field, as well as lots of other potential useful information. So then I convert the Date to POSIXlt format, and get the day of the year; since this starts with zero I then add 1.
dat <- data.frame(d=1:6,
m=rep(c(1,2,12), 2),
y=rep(c(1950, 1951), each=3))
dat$Date <- as.Date(with(dat, paste(y, m, d, sep="-")))
dat$doy <- as.POSIXlt(dat$Date)$yday + 1
dat
## d m y Date doy
## 1 1 1 1950 1950-01-01 1
## 2 2 2 1950 1950-02-02 33
## 3 3 12 1950 1950-12-03 337
## 4 4 1 1951 1951-01-04 4
## 5 5 2 1951 1951-02-05 36
## 6 6 12 1951 1951-12-06 340
The advantage of this is that it works correctly even if the order of your rows is changed or a particular day is missing. It's almost never a good idea to have your analysis depend on the order of the data.
Upvotes: 4