Reputation: 389
To start off - thank you for taking your time to view/answer my question. I will try my best to explain this question (which hopefully isn't too difficult, I am not an expert in R by no means)
Lets suppose I have the below data (the first column is the Date, the second column is "levels" and levels is a repeating sequence from 2:8 for each day. Var 3 is just some statistic..)
Date level var3
1 2/10/2017 2 0.2340
2 2/10/2017 3 0.1240
3 2/10/2017 4 0.5120
4 2/10/2017 5 0.4440
5 2/10/2017 6 0.1200
6 2/10/2017 7 0.5213
7 2/10/2017 8 0.1200
8 2/11/2017 2 0.4100
9 2/11/2017 3 0.6500
10 2/11/2017 4 0.2400
11 2/11/2017 5 0.5500
13 2/11/2017 6 0.3100
14 2/11/2017 7 0.1500
15 2/11/2017 8 0.2300
16 2/12/2017 2 0.1500
17 2/12/2017 3 0.5800
18 2/12/2017 4 0.3300
19 2/12/2017 5 0.2100
20 2/12/2017 6 0.9800
21 2/12/2017 7 0.3200
22 2/12/2017 8 0.1800
My goal is to standardize the data BY doing the following:
- Create a new column called 'Change'
- For each unique date, Change is (log(var3) - log(var3[level == 5])
Essentially, for each unique date, I want to take the Var3 data by row and subtract the log of it by the level 5 value of the var3 FOR THAT DAY* [so for example, change[1] = log(.2340) - log(.4440) .. change[2] = log(.1240) - log(.444)... and but for change[10] it would be log(.2400) - log(.5500).. and so on..
I am having trouble code this in R, below is the code I came up with (but the results seem to be 21 rows x 24 vars... but I really just want the 21 rows and 4 columns, with the 4th one being the "CHANGE"... and I just can't get it:/ )
log_mean <- function(data_set) {
for (i in unique(data_set$Date) {
midpoint <- data_set$var3[data_set$level == 5]
c <- (log(data_set$var3) - log(midpoint))
change <- rbind(change,c)}}
y <- cbind(x, change)
Please help if you can, Intuitively it seems real easy to do, I am not sure how to do this in R [and yes, I am relatively new-ish]..
Thank you so much!
Upvotes: 1
Views: 183
Reputation: 11514
Try this:
library(dplyr)
df %>% group_by(Date) %>% mutate(change = log(var3) - log(var3[level==5]))
# A tibble: 21 x 4
# Groups: Date [3]
Date level var3 change
<fct> <int> <dbl> <dbl>
1 2/10/2017 2 0.234 -0.641
2 2/10/2017 3 0.124 -1.28
3 2/10/2017 4 0.512 0.143
4 2/10/2017 5 0.444 0
5 2/10/2017 6 0.12 -1.31
6 2/10/2017 7 0.521 0.161
7 2/10/2017 8 0.12 -1.31
8 2/11/2017 2 0.41 -0.294
9 2/11/2017 3 0.65 0.167
10 2/11/2017 4 0.24 -0.829
# ... with 11 more rows
In general, this falls into the category split-apply-combine
. Google the term and familiarize yourself with the options that R offers you (e.g. base, dplyr, data.table
). It will come in handy in the future.
Upvotes: 1