Reputation: 85
Sample data:
Date <- as.Date(c('1-01-2008','2-01-2008', '3-01-2008', '1-01-2008','2-01- 2008', '3-01-2008','1-01-2008','2-01-2008', '3-01-2008', '1-01-2008','2-01-2008', '3-01-2008'), format = "%m-%d-%Y")
Country <- c('US', 'US','US', 'JP', 'JP', 'JP', 'US', 'US','US', 'JP', 'JP', 'JP')
Category <- c('Apple', 'Apple', 'Apple', 'Apple', 'Apple', 'Apple', 'foo', 'foo','foo', 'foo','foo', 'foo')
Value <- c(runif(12, -0.5, 10))
df <- data.frame(Date, Country, Category, Value)
What I want to do is subtract for each Month and within each country the values of Apple and foo (so for US and 2008-01-01 the value would be -1.2357797). However I want to insert the result as a row, with Category name e.g. "diff".
I figured out how to do this with dplyr/mutate but only by inserting a whole new column, in which case the table doesn't make sense anymore (as the categories don't fit and I am converting to list later on):
df <- df %>%
group_by(Country, Date) %>%
mutate(
diff = Value[Category=="Apple"] - Value[Category=="foo"])
Edit: Important: I want to add multiple transformations in this way, not just the diff mentioned in the example.
Edit2: Thanks for all the helpful replies. Using the tidyr/dplyr method suggested by @akrun I will use this to insert more transformations:
library(tidyr)
library(dplyr)
df <- spread(df, Category, Value) %>%
mutate(diff=Apple- foo, xyz = Apple+foo) %>%
gather(Category, Value, Apple:diff, Apple:xyz)
Upvotes: 1
Views: 67
Reputation: 887951
We can either use data.table
or dplyr
. With data.table
, convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'Date', 'Country', we get the difference of 'Value' for corresponding 'Apple' and 'foo' values in 'Category' column, and also create the 'Category' with 'diff' as value. This will be a new summarised dataset ('dfN') which we can rbind
with the original dataset using rbindlist
. If needed, we can order
by 'Date', 'Country'
library(data.table)
dfN <- setDT(df)[,
list(Category="diff",
Value=Value[Category=="Apple"]- Value[Category=="foo"]),
by = .(Date, Country)]
rbindlist(list(df, dfN))[order(Date,Country)]
# Date Country Category Value
# 1: 2008-01-01 JP Apple 9.8861949
# 2: 2008-01-01 JP foo 6.8009149
# 3: 2008-01-01 JP diff 3.0852799
# 4: 2008-01-01 US Apple -0.3047560
# 5: 2008-01-01 US foo 9.1748432
# 6: 2008-01-01 US diff -9.4795991
# 7: 2008-02-01 JP Apple 8.7836616
# 8: 2008-02-01 JP foo 5.4775849
# 9: 2008-02-01 JP diff 3.3060767
#10: 2008-02-01 US Apple 1.6155057
#11: 2008-02-01 US foo 3.6720346
#12: 2008-02-01 US diff -2.0565289
#13: 2008-03-01 JP Apple 1.9879906
#14: 2008-03-01 JP foo 7.1387297
#15: 2008-03-01 JP diff -5.1507391
#16: 2008-03-01 US Apple 1.1435151
#17: 2008-03-01 US foo 0.6596238
#18: 2008-03-01 US diff 0.4838913
Or another option is using dcast/melt
from data.table
melt(dcast(setDT(df), Date+Country~Category,
value.var='Value')[, diff:= Apple-foo],
id.var=c('Date', 'Country'))
If we use the gather/spread
option (similar to melt/dcast
) from tidyr
,
library(tidyr)
library(dplyr)
spread(df, Category, Value) %>%
mutate(diff=Apple- foo) %>%
gather(Category, Value, Apple:diff)
Or use dplyr
, we use the same technique, instead of rbindlist
, we do with bind_rows
.
library(dplyr)
df %>%
group_by(Country, Date) %>%
summarise(Value = Value[Category=="Apple"] -
Value[Category=="foo"],
Category= "diff") %>%
bind_rows(df, .) %>%
arrange(Date, Country)
# Date Country Category Value
# (date) (fctr) (chr) (dbl)
#1 2008-01-01 JP Apple 9.8861949
#2 2008-01-01 JP foo 6.8009149
#3 2008-01-01 JP diff 3.0852799
#4 2008-01-01 US Apple -0.3047560
#5 2008-01-01 US foo 9.1748432
#6 2008-01-01 US diff -9.4795991
#7 2008-02-01 JP Apple 8.7836616
#8 2008-02-01 JP foo 5.4775849
#9 2008-02-01 JP diff 3.3060767
#10 2008-02-01 US Apple 1.6155057
#11 2008-02-01 US foo 3.6720346
#12 2008-02-01 US diff -2.0565289
#13 2008-03-01 JP Apple 1.9879906
#14 2008-03-01 JP foo 7.1387297
#15 2008-03-01 JP diff -5.1507391
#16 2008-03-01 US Apple 1.1435151
#17 2008-03-01 US foo 0.6596238
#18 2008-03-01 US diff 0.4838913
Upvotes: 3