Sam Pickwick
Sam Pickwick

Reputation: 313

Create variable for day of the experiment

I have a large data set that spanned a month in time with the data stamped in a column called txn_date like the below. (this is a toy reproduction of it)

dat1 <- read.table(text = "var1 txn_date 
5 2020-10-25
1 2020-10-25
3 2020-10-26
4 2020-10-27
1 2020-10-27 
3 2020-10-31  
3 2020-11-01 
8 2020-11-02 ", header = TRUE)

Ideally I would like to get a column in my data frame for each date in the data which I think could be done by first getting a single column that is 1 for the first date that appears and then so on.

So something like this

dat1 <- read.table(text = "var1 txn_date day
5 2020-10-25 1
1 2020-10-25 1
3 2020-10-26 2
4 2020-10-27 3
1 2020-10-27 3
3 2020-10-31 7  
3 2020-11-01 8
8 2020-11-12 9 ", header = TRUE

I'm not quite sure how to get this. The txn_date column is as.Date in my actual data frame. I think if I could get the single day column like is listed above (then convert it to a factor) then I could always one hot encode the actual levels of that column if I need to. Ultimately I need to use the day of the experiment as a regressor in a regression I'm going to run.

Something along the lines of y ~ x + day_1 + day_2 +...+ error

Upvotes: 2

Views: 76

Answers (2)

jared_mamrot
jared_mamrot

Reputation: 26690

Would this be suitable?

library(tidyverse)
dat1 <- read.table(text = "var1 txn_date 
5 2020-10-25
1 2020-10-25
3 2020-10-26
4 2020-10-27
1 2020-10-27 
3 2020-10-31  
3 2020-11-01 
8 2020-11-02 ", header = TRUE)

dat1$txn_date <- as.Date(dat1$txn_date)
dat1 %>% 
  mutate(days = txn_date - txn_date[1] + 1)
#  var1   txn_date   days
#1    5 2020-10-25 1 days
#2    1 2020-10-25 1 days
#3    3 2020-10-26 2 days
#4    4 2020-10-27 3 days
#5    1 2020-10-27 3 days
#6    3 2020-10-31 7 days
#7    3 2020-11-01 8 days
#8    8 2020-11-02 9 days

Upvotes: 2

akrun
akrun

Reputation: 887501

We create a sequence of dates based on the min and max of 'txn_date' and match

dates <- seq(min(as.Date(dat1$txn_date)), 
           max(as.Date(dat1$txn_date)), by = '1 day')
dat1$day <- with(dat1, match(as.Date(txn_date), dates))
dat1$day
#[1] 1 1 2 3 3 7 8 9

Or may use factor route

with(dat1, as.integer(factor(txn_date, levels = as.character(dates))))
#[1] 1 1 2 3 3 7 8 9

Upvotes: 1

Related Questions