user2474794
user2474794

Reputation:

Add a new day column

Having this data:

dates
1990-02-02 01:00:00
1990-02-02 02:00:00
1990-02-03 01:00:00
1990-02-03 02:00:00
1990-02-04 01:00:00
1990-02-04 02:00:00

Could I create a column, in which every row has the number of day like this?

number dates
1      1990-02-02 01:00:00
1      1990-02-02 02:00:00
2      1990-02-03 01:00:00
2      1990-02-03 02:00:00
3      1990-02-04 01:00:00
3      1990-02-04 02:00:00

Upvotes: 1

Views: 96

Answers (3)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

This solution assumes that the dates are listed in chronological order (or are at least sorted by day) and that the dates are actually formatted as date time objects.

temp <- data.frame(dates = c('1990-02-02 01:00:00',
                             '1990-02-02 02:00:00',
                             '1990-02-03 01:00:00',
                             '1990-02-03 02:00:00',
                             '1990-02-04 01:00:00',
                             '1990-02-04 02:00:00',
                             '1990-02-04 03:00:00'))
temp$dates <- as.POSIXct(temp$dates, tz = "GMT")
x <- table(as.Date(temp$dates))
temp$number <- rep(seq_along(x), x)
temp
#                 dates number
# 1 1990-02-02 01:00:00      1
# 2 1990-02-02 02:00:00      1
# 3 1990-02-03 01:00:00      2
# 4 1990-02-03 02:00:00      2
# 5 1990-02-04 01:00:00      3
# 6 1990-02-04 02:00:00      3
# 7 1990-02-04 03:00:00      3

The basic idea is to just strip the time out (using as.Date) and tabulate the frequency of each day. You can then use rep on that output to create your "number" variable.


Slap forehead and post a simpler solution

Just use factor on as.Date. This would work even on a data.frame where the data are not ordered:

temp <- data.frame(dates = c('1990-02-02 01:00:00',
                             '1990-02-02 02:00:00',
                             '1990-02-03 01:00:00',
                             '1990-02-03 02:00:00',
                             '1990-02-04 01:00:00',
                             '1990-02-04 02:00:00',
                             '1990-02-04 03:00:00'))
temp$dates <- as.POSIXct(temp$dates, tz = "GMT")
within(temp, {
  counts <- as.numeric(factor(as.Date(dates)))
})
#                 dates counts
# 1 1990-02-02 01:00:00      1
# 2 1990-02-02 02:00:00      1
# 3 1990-02-03 01:00:00      2
# 4 1990-02-03 02:00:00      2
# 5 1990-02-04 01:00:00      3
# 6 1990-02-04 02:00:00      3
# 7 1990-02-04 03:00:00      3

Upvotes: 2

Steph Locke
Steph Locke

Reputation: 6146

The rank function should do what you need but consult the documentation for it as there's not an easily reproducible exampe for me to play with

temp <- data.frame(dates = c('1990-02-02 01:00:00',
                             '1990-02-02 02:00:00',
                             '1990-02-03 01:00:00',
                             '1990-02-03 02:00:00',
                             '1990-02-04 01:00:00',
                             '1990-02-04 02:00:00',
                             '1990-02-04 03:00:00'))
temp$datesonly<-as.Date(temp$dates)
temp2<-data.frame(dates=unique(temp$datesonly),ranks=rank(unique(temp$datesonly),ties.method="first"))
temp<-merge(temp,temp2,by.x="datesonly",by.y="dates")

Upvotes: 2

user2234497
user2234497

Reputation: 19

Tries with this...

- library(lubridate)

date <- c("1990-02-02 01:00:00",
"1990-02-02 02:00:00",
"1990-02-03 01:00:00",
"1990-02-03 02:00:00",
"1990-02-04 01:00:00",
"1990-02-04 02:00:00")

number <- day(date)
cbind.data.frame(number,date)

Upvotes: 0

Related Questions