Reputation: 313
I have a large data set that spanned a month in time with the data stamped in a column called txn_date
like the below. (this is a toy reproduction of it)
dat1 <- read.table(text = "var1 txn_date
5 2020-10-25
1 2020-10-25
3 2020-10-26
4 2020-10-27
1 2020-10-27
3 2020-10-31
3 2020-11-01
8 2020-11-02 ", header = TRUE)
Ideally I would like to get a column in my data frame for each date in the data which I think could be done by first getting a single column that is 1 for the first date that appears and then so on.
So something like this
dat1 <- read.table(text = "var1 txn_date day
5 2020-10-25 1
1 2020-10-25 1
3 2020-10-26 2
4 2020-10-27 3
1 2020-10-27 3
3 2020-10-31 7
3 2020-11-01 8
8 2020-11-12 9 ", header = TRUE
I'm not quite sure how to get this. The txn_date
column is as.Date
in my actual data frame. I think if I could get the single day
column like is listed above (then convert it to a factor) then I could always one hot encode the actual levels of that column if I need to. Ultimately I need to use the day of the experiment as a regressor in a regression I'm going to run.
Something along the lines of y ~ x + day_1 + day_2 +...+ error
Upvotes: 2
Views: 76
Reputation: 26690
Would this be suitable?
library(tidyverse)
dat1 <- read.table(text = "var1 txn_date
5 2020-10-25
1 2020-10-25
3 2020-10-26
4 2020-10-27
1 2020-10-27
3 2020-10-31
3 2020-11-01
8 2020-11-02 ", header = TRUE)
dat1$txn_date <- as.Date(dat1$txn_date)
dat1 %>%
mutate(days = txn_date - txn_date[1] + 1)
# var1 txn_date days
#1 5 2020-10-25 1 days
#2 1 2020-10-25 1 days
#3 3 2020-10-26 2 days
#4 4 2020-10-27 3 days
#5 1 2020-10-27 3 days
#6 3 2020-10-31 7 days
#7 3 2020-11-01 8 days
#8 8 2020-11-02 9 days
Upvotes: 2
Reputation: 887501
We create a sequence of dates based on the min
and max
of 'txn_date' and match
dates <- seq(min(as.Date(dat1$txn_date)),
max(as.Date(dat1$txn_date)), by = '1 day')
dat1$day <- with(dat1, match(as.Date(txn_date), dates))
dat1$day
#[1] 1 1 2 3 3 7 8 9
Or may use factor
route
with(dat1, as.integer(factor(txn_date, levels = as.character(dates))))
#[1] 1 1 2 3 3 7 8 9
Upvotes: 1