yungpadewon
yungpadewon

Reputation: 389

How do I standardize data differently for different dates in my dataframe in R?

To start off - thank you for taking your time to view/answer my question. I will try my best to explain this question (which hopefully isn't too difficult, I am not an expert in R by no means)

Lets suppose I have the below data (the first column is the Date, the second column is "levels" and levels is a repeating sequence from 2:8 for each day. Var 3 is just some statistic..)

      Date     level  var3
1  2/10/2017     2   0.2340
2  2/10/2017     3   0.1240
3  2/10/2017     4   0.5120
4  2/10/2017     5   0.4440
5  2/10/2017     6   0.1200
6  2/10/2017     7   0.5213
7  2/10/2017     8   0.1200
8  2/11/2017     2   0.4100
9  2/11/2017     3   0.6500
10 2/11/2017     4   0.2400
11 2/11/2017     5   0.5500
13 2/11/2017     6   0.3100
14 2/11/2017     7   0.1500
15 2/11/2017     8   0.2300
16 2/12/2017     2   0.1500
17 2/12/2017     3   0.5800
18 2/12/2017     4   0.3300
19 2/12/2017     5   0.2100
20 2/12/2017     6   0.9800
21 2/12/2017     7   0.3200
22 2/12/2017     8   0.1800

My goal is to standardize the data BY doing the following:

- Create a new column called 'Change'
- For each unique date, Change is (log(var3) - log(var3[level == 5])

Essentially, for each unique date, I want to take the Var3 data by row and subtract the log of it by the level 5 value of the var3 FOR THAT DAY* [so for example, change[1] = log(.2340) - log(.4440) .. change[2] = log(.1240) - log(.444)... and but for change[10] it would be log(.2400) - log(.5500).. and so on..

I am having trouble code this in R, below is the code I came up with (but the results seem to be 21 rows x 24 vars... but I really just want the 21 rows and 4 columns, with the 4th one being the "CHANGE"... and I just can't get it:/ )

     log_mean <- function(data_set) {
     for (i in unique(data_set$Date) {
     midpoint <- data_set$var3[data_set$level == 5]
     c <- (log(data_set$var3) - log(midpoint))
     change <- rbind(change,c)}}
     y <- cbind(x, change)

Please help if you can, Intuitively it seems real easy to do, I am not sure how to do this in R [and yes, I am relatively new-ish]..

Thank you so much!

Upvotes: 1

Views: 183

Answers (1)

coffeinjunky
coffeinjunky

Reputation: 11514

Try this:

library(dplyr)
df %>% group_by(Date) %>% mutate(change = log(var3) - log(var3[level==5]))
# A tibble: 21 x 4
# Groups:   Date [3]
   Date      level  var3 change
   <fct>     <int> <dbl>  <dbl>
 1 2/10/2017     2 0.234 -0.641
 2 2/10/2017     3 0.124 -1.28 
 3 2/10/2017     4 0.512  0.143
 4 2/10/2017     5 0.444  0    
 5 2/10/2017     6 0.12  -1.31 
 6 2/10/2017     7 0.521  0.161
 7 2/10/2017     8 0.12  -1.31 
 8 2/11/2017     2 0.41  -0.294
 9 2/11/2017     3 0.65   0.167
10 2/11/2017     4 0.24  -0.829
# ... with 11 more rows

In general, this falls into the category split-apply-combine. Google the term and familiarize yourself with the options that R offers you (e.g. base, dplyr, data.table). It will come in handy in the future.

Upvotes: 1

Related Questions