Reputation: 117
A sample dataset:
Price=c(6651, 7255, 25465, 35645, 2556, 3665)
NumberPurchased=c(25, 30, 156, 250, 12, 16)
Type=c("A", "A", "C", "C", "B", "B")
Source=c("GSC", "MYL", "TTC", "ZAF", "CAN", "HLT")
df1 <- data.frame(Price, NumberPurchased, Type, Source)
I want to be able to create a new dataframe with two additional variables (ID
, PurchaseDate
) but more rows of data based on variable Type
.
The rules I want to apply:
If Type=A, PurchaseDate
is "2013", "2014".
If Type=B, PurchaseDate
is "2013".
If Type=C, PurchaseDate
is "2013", "2014", "2015".
If Type
is A, divide Price
and NumberPurchased
by 2, and have 2 rows with different PurchaseDate
as specified above.
If Type
is B, leave as is with PurhcaseDate
as 2013.
If Type
is C, divide Price
and NumberPurchased
by 3, and have 3 rows with different PurchaseDate
as specified above.
Therefore, I want something like this as a new dataset:
Price=c(3325.5, 3325.5, 3627.5, 3627.5, 8488.3, 8488.3, 8488.3, 11881.6, 11881.6, 11881.6, 2556, 3665)
NumberPurchased=c(12.5, 12.5, 15, 15, 52, 52, 52, 83.3, 83.3, 83.3, 12, 16)
Type=c("A", "A", "A", "A", "C", "C", "C", "C", "C", "C","B", "B")
Source=c("GSC", "GSC", "MYL", "MYL", "TTC","TTC", "TTC", "ZAF", "ZAF","ZAF", "CAN", "HLT")
PurchaseDate=c("2013", "2014", "2013", "2014", "2013", "2014", "2015", "2013", "2014", "2015", "2013", "2013")
ID=c(1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 6)
df2 <- data.frame(Price, NumberPurchased, Type, Source, PurchaseDate, ID)
Any insight?
Upvotes: 1
Views: 40
Reputation: 92292
Here's a possible approach. First, we will create an index for Type
, then we will grow the data accordingly, then we will use the data.table
package in order to compute the new variables.
library(data.table)
setDT(df1)[, indx := as.numeric(factor(Type, levels = c("B", "A", "C")))]
# setDT(df1)[, indx := ifelse(Type == "C", 3, 2)] # Alternative index per your comment
df2 <- df1[rep(seq_len(.N), indx)]
df2[, `:=`(
Price = Price/.N,
PurchaseDate = 2013:(2013 + (.N - 1)),
NumberPurchased = NumberPurchased/.N,
ID = .GRP
),
by = .(Source, Type)][]
# Price NumberPurchased Type Source indx PurchaseDate ID
# 1: 3325.500 12.50000 A GSC 2 2013 1
# 2: 3325.500 12.50000 A GSC 2 2014 1
# 3: 3627.500 15.00000 A MYL 2 2013 2
# 4: 3627.500 15.00000 A MYL 2 2014 2
# 5: 8488.333 52.00000 C TTC 3 2013 3
# 6: 8488.333 52.00000 C TTC 3 2014 3
# 7: 8488.333 52.00000 C TTC 3 2015 3
# 8: 11881.667 83.33333 C ZAF 3 2013 4
# 9: 11881.667 83.33333 C ZAF 3 2014 4
# 10: 11881.667 83.33333 C ZAF 3 2015 4
# 11: 2556.000 12.00000 B CAN 1 2013 5
# 12: 3665.000 16.00000 B HLT 1 2013 6
Upvotes: 2