Reputation: 1036
I have a dataframe which has a date column, a column of ints (labelled value
in the example below), and 12 other numeric columns, each corresponding to a month and labelled X1
(jan) through X12
(dec).
It looks something like:
date_var value X1 X2 X3 ... X12
2016-01-01 100 1212 4161 9080 ... 383
2016-02-01 150 1212 4161 9080 ... 383
2016-03-01 150 1212 4161 9080 ... 383
What I'd like to do is create a new column, lets call it Z, which corresponds to the number in the value
column, divided by the appropriate monthly value.
For example, in the table above Z for the 2016-01-01
entry would equal 100/1212, whereas the 2016-02-01
entry would instead divide by X2 for Feb and 2016-03-01
would have value
divided by X3:
date_var value X1 X2 X3 ... X12 Z
2016-01-01 100 1212 4161 9080 ... 383 0.0825
2016-02-01 150 1212 4161 9080 ... 383 0.0360
2016-03-01 150 1212 4161 9080 ... 383 0.0165
I've tried various approaches along the lines of attempting to divide value
by df[paste("X", month(df$date_var), sep = '')]
, although this returned list a rather than working element-wise so obviously isn't the correct approach.
Upvotes: 0
Views: 1511
Reputation: 2375
Another good way using the dplyr
and tidyr
packages basically takes the R approach of converting your information to long data frame format (i.e. the same kind of information in the same column, here all your X1-X12) and then uses a filter condition to only consider the month values that match the month in your date variable:
library(dplyr)
library(tidyr)
library(lubridate)
# test data frame (code from parksw3)
data <- data_frame(
date_var = as.Date(c("2016-01-01", "2016-02-01", "2016-03-01")),
value = c(100, 150, 150),
X1 = rep(1212, 3),
X2 = rep(4161, 3),
X3 = rep(9080, 3),
X12 = rep(383, 3)
)
# calculate the resulting Z column
result <- data %>%
# gather all the month (X1-X12) values into long format
# with month_var and month_value as key/value pair
gather(month_var, month_value, starts_with("X")) %>%
# only consider the month_value for the month_var that matches the date's month
filter(month_var == paste0("X", month(date_var))) %>%
# calculate the derived quantity
mutate(Z = value/month_value)
print(result)
## date_var value month_var month_value Z
## <date> <dbl> <chr> <dbl> <dbl>
## 1 2016-01-01 100 X1 1212 0.08250825
## 2 2016-02-01 150 X2 4161 0.03604903
## 3 2016-03-01 150 X3 9080 0.01651982
If you want, you can merge it back into your original data frame:
data_all <- left_join(data, select(result, date_var, Z), by = "date_var")
print(data_all)
## date_var value X1 X2 X3 X12 Z
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2016-01-01 100 1212 4161 9080 383 0.08250825
## 2 2016-02-01 150 1212 4161 9080 383 0.03604903
## 3 2016-03-01 150 1212 4161 9080 383 0.01651982
Upvotes: 2
Reputation: 1427
As an exploration into the trials of R indexing - a pseudo-tidyverse
answer.
First let's generate some dummy data.
library(tidyverse)
data <- data_frame(
date_var = seq(as.Date("2016-01-01"), by = "month", length.out = 12),
value = ceiling(runif(12, 100, 200))
)
data %>%
mutate(months = map(value, function(x){matrix(ceiling(runif(12, 50, 5000)), ncol = 12)}),
months = map(months, as_data_frame)) %>%
unnest(months) %>%
as.data.frame() ->
sample.data
head(sample.data)
#> date_var value V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
#> 1 2016-01-01 147 2004 2456 3983 4464 2473 2824 2038 1354 3433 51 574 1381
#> 2 2016-02-01 170 2862 3579 543 1458 2472 826 3865 528 187 951 4732 1849
#> 3 2016-03-01 107 2860 1359 4366 1824 173 3541 624 76 4113 771 808 3457
#> 4 2016-04-01 115 1707 4015 3951 2774 2726 1789 2189 1903 1706 124 3679 1876
#> 5 2016-05-01 120 1058 4169 2594 4334 221 494 2032 1425 2525 3358 791 3691
#> 6 2016-06-01 191 4169 570 3245 1682 3811 4350 2344 4338 2258 779 1835 2480
Now that we have some sample data, we can use dual indexing to extract the value of each column, based on the month. I'm assuming that the months are named V1
-- V12
(as they are in my dataset).
sample.data %>%
mutate(Z = .[cbind(seq_along(nrow(.)), match(sprintf("V%s", month(date_var)), names(.)))],
Z = as.numeric(Z),
Z = value / Z) ->
result
head(result)
#> date_var value V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 Z
#> 1 2016-01-01 147 2004 2456 3983 4464 2473 2824 2038 1354 3433 51 574 1381 0.07335329
#> 2 2016-02-01 170 2862 3579 543 1458 2472 826 3865 528 187 951 4732 1849 0.06921824
#> 3 2016-03-01 107 2860 1359 4366 1824 173 3541 624 76 4113 771 808 3457 0.02686417
#> 4 2016-04-01 115 1707 4015 3951 2774 2726 1789 2189 1903 1706 124 3679 1876 0.02576165
#> 5 2016-05-01 120 1058 4169 2594 4334 221 494 2032 1425 2525 3358 791 3691 0.04852406
#> 6 2016-06-01 191 4169 570 3245 1682 3811 4350 2344 4338 2258 779 1835 2480 0.06763456
Upvotes: 1
Reputation: 116
Not the most elegant way but you can use a for loop (assuming this is the layout of the data):
data = "yourData"
x = as.numeric(format(data[,1],"%m"))
for (i in 1:length(data[,1])){
data[i,"Z"] = data[i,2]/data[i,x[i]+2]
}
Upvotes: 0
Reputation: 659
Take a look at this post. I think there should be a simpler way but here's what I did based on that post and they both seem to work:
Data:
df <- data.frame(
date_var = as.Date(c("2016-01-01", "2016-02-01", "2016-03-01")),
value = c(100, 150, 150),
X1 = rep(1212, 3),
X2 = rep(4161, 3),
X3 = rep(9080, 3),
X12 = rep(383, 3)
)
Method 1:
m <- paste0("X", month(df$date_var))
sub <- cbind(1:nrow(df),
match(m, names(df))
)
Z2 <- df$value/as.numeric(df[sub])
df2 <- cbind(df, Z2)
Method 2:
Z3 <- sapply(rownames(df), function(x){
with(df[x,],{
m <- month(date_var)
value/get(paste0("X", m))
})
})
df3 <- cbind(df, Z3)
Result:
## date_var value X1 X2 X3 X12 Z3
## 1 2016-01-01 100 1212 4161 9080 383 0.08250825
## 2 2016-02-01 150 1212 4161 9080 383 0.03604903
## 3 2016-03-01 150 1212 4161 9080 383 0.01651982
## 4 2017-02-01 150 1212 4161 9080 383 0.03604903
Upvotes: 1