Dreica
Dreica

Reputation: 85

Insert transformation as new row rather than column

Sample data:

Date <- as.Date(c('1-01-2008','2-01-2008', '3-01-2008', '1-01-2008','2-01-    2008', '3-01-2008','1-01-2008','2-01-2008', '3-01-2008', '1-01-2008','2-01-2008', '3-01-2008'), format = "%m-%d-%Y") 
Country <- c('US', 'US','US', 'JP', 'JP', 'JP', 'US', 'US','US', 'JP', 'JP', 'JP') 
Category <- c('Apple', 'Apple', 'Apple', 'Apple', 'Apple', 'Apple', 'foo', 'foo','foo', 'foo','foo', 'foo') 
Value <- c(runif(12, -0.5, 10))
df <- data.frame(Date, Country, Category, Value)

What I want to do is subtract for each Month and within each country the values of Apple and foo (so for US and 2008-01-01 the value would be -1.2357797). However I want to insert the result as a row, with Category name e.g. "diff".

I figured out how to do this with dplyr/mutate but only by inserting a whole new column, in which case the table doesn't make sense anymore (as the categories don't fit and I am converting to list later on):

df <- df %>%
 group_by(Country, Date) %>%
    mutate(
      diff = Value[Category=="Apple"] - Value[Category=="foo"])

Edit: Important: I want to add multiple transformations in this way, not just the diff mentioned in the example.

Edit2: Thanks for all the helpful replies. Using the tidyr/dplyr method suggested by @akrun I will use this to insert more transformations:

library(tidyr)
library(dplyr)
 df <- spread(df, Category, Value) %>%
    mutate(diff=Apple- foo, xyz = Apple+foo) %>% 
    gather(Category, Value, Apple:diff, Apple:xyz)

Upvotes: 1

Views: 67

Answers (1)

akrun
akrun

Reputation: 887951

We can either use data.table or dplyr. With data.table, convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'Date', 'Country', we get the difference of 'Value' for corresponding 'Apple' and 'foo' values in 'Category' column, and also create the 'Category' with 'diff' as value. This will be a new summarised dataset ('dfN') which we can rbind with the original dataset using rbindlist. If needed, we can order by 'Date', 'Country'

library(data.table)
dfN <- setDT(df)[, 
    list(Category="diff",
         Value=Value[Category=="Apple"]- Value[Category=="foo"]), 
               by = .(Date, Country)]
rbindlist(list(df, dfN))[order(Date,Country)]
#       Date Country Category      Value
# 1: 2008-01-01      JP    Apple  9.8861949
# 2: 2008-01-01      JP      foo  6.8009149
# 3: 2008-01-01      JP     diff  3.0852799
# 4: 2008-01-01      US    Apple -0.3047560
# 5: 2008-01-01      US      foo  9.1748432
# 6: 2008-01-01      US     diff -9.4795991
# 7: 2008-02-01      JP    Apple  8.7836616
# 8: 2008-02-01      JP      foo  5.4775849
# 9: 2008-02-01      JP     diff  3.3060767
#10: 2008-02-01      US    Apple  1.6155057
#11: 2008-02-01      US      foo  3.6720346
#12: 2008-02-01      US     diff -2.0565289
#13: 2008-03-01      JP    Apple  1.9879906
#14: 2008-03-01      JP      foo  7.1387297
#15: 2008-03-01      JP     diff -5.1507391
#16: 2008-03-01      US    Apple  1.1435151
#17: 2008-03-01      US      foo  0.6596238
#18: 2008-03-01      US     diff  0.4838913

Or another option is using dcast/melt from data.table

melt(dcast(setDT(df), Date+Country~Category, 
    value.var='Value')[, diff:= Apple-foo], 
      id.var=c('Date', 'Country'))

If we use the gather/spread option (similar to melt/dcast) from tidyr,

library(tidyr)
library(dplyr)
spread(df, Category, Value) %>%
       mutate(diff=Apple- foo) %>% 
       gather(Category, Value, Apple:diff)

Or use dplyr, we use the same technique, instead of rbindlist, we do with bind_rows.

 library(dplyr)
 df %>%
     group_by(Country, Date) %>%
     summarise(Value =  Value[Category=="Apple"] - 
                        Value[Category=="foo"],
                Category= "diff") %>%
     bind_rows(df, .) %>%
     arrange(Date, Country)
#        Date Country Category      Value
#       (date)  (fctr)    (chr)      (dbl)
#1  2008-01-01      JP    Apple  9.8861949
#2  2008-01-01      JP      foo  6.8009149
#3  2008-01-01      JP     diff  3.0852799
#4  2008-01-01      US    Apple -0.3047560
#5  2008-01-01      US      foo  9.1748432
#6  2008-01-01      US     diff -9.4795991
#7  2008-02-01      JP    Apple  8.7836616
#8  2008-02-01      JP      foo  5.4775849
#9  2008-02-01      JP     diff  3.3060767
#10 2008-02-01      US    Apple  1.6155057
#11 2008-02-01      US      foo  3.6720346
#12 2008-02-01      US     diff -2.0565289
#13 2008-03-01      JP    Apple  1.9879906
#14 2008-03-01      JP      foo  7.1387297
#15 2008-03-01      JP     diff -5.1507391
#16 2008-03-01      US    Apple  1.1435151
#17 2008-03-01      US      foo  0.6596238
#18 2008-03-01      US     diff  0.4838913

Upvotes: 3

Related Questions