Tan
Tan

Reputation: 117

Create rows of data conditionally

A sample dataset:

Price=c(6651, 7255, 25465, 35645, 2556, 3665)
NumberPurchased=c(25, 30, 156, 250, 12, 16)
Type=c("A", "A", "C", "C", "B", "B")
Source=c("GSC", "MYL", "TTC", "ZAF", "CAN", "HLT")
df1 <- data.frame(Price, NumberPurchased, Type, Source)

I want to be able to create a new dataframe with two additional variables (ID, PurchaseDate) but more rows of data based on variable Type.

The rules I want to apply: If Type=A, PurchaseDate is "2013", "2014". If Type=B, PurchaseDate is "2013". If Type=C, PurchaseDate is "2013", "2014", "2015".

If Type is A, divide Price and NumberPurchased by 2, and have 2 rows with different PurchaseDate as specified above. If Type is B, leave as is with PurhcaseDate as 2013. If Type is C, divide Price and NumberPurchased by 3, and have 3 rows with different PurchaseDate as specified above.

Therefore, I want something like this as a new dataset:

Price=c(3325.5, 3325.5, 3627.5, 3627.5, 8488.3, 8488.3, 8488.3, 11881.6, 11881.6, 11881.6, 2556, 3665)
NumberPurchased=c(12.5, 12.5, 15, 15, 52, 52, 52, 83.3, 83.3, 83.3, 12, 16)
Type=c("A", "A", "A", "A", "C", "C", "C", "C", "C", "C","B", "B")
Source=c("GSC", "GSC", "MYL", "MYL", "TTC","TTC", "TTC",  "ZAF", "ZAF","ZAF", "CAN", "HLT")
PurchaseDate=c("2013", "2014", "2013", "2014", "2013", "2014", "2015", "2013", "2014", "2015", "2013", "2013")
ID=c(1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 6)
df2 <- data.frame(Price, NumberPurchased, Type, Source, PurchaseDate, ID)

Any insight?

Upvotes: 1

Views: 40

Answers (1)

David Arenburg
David Arenburg

Reputation: 92292

Here's a possible approach. First, we will create an index for Type, then we will grow the data accordingly, then we will use the data.table package in order to compute the new variables.

library(data.table)
setDT(df1)[, indx := as.numeric(factor(Type, levels = c("B", "A", "C")))]
# setDT(df1)[, indx := ifelse(Type == "C", 3, 2)] # Alternative index per your comment
df2 <- df1[rep(seq_len(.N), indx)]

df2[, `:=`(
            Price = Price/.N,
            PurchaseDate = 2013:(2013 + (.N - 1)),
            NumberPurchased = NumberPurchased/.N,
            ID = .GRP
           ), 
           by = .(Source, Type)][]

#         Price NumberPurchased Type Source indx PurchaseDate ID
#  1:  3325.500        12.50000    A    GSC    2         2013  1
#  2:  3325.500        12.50000    A    GSC    2         2014  1
#  3:  3627.500        15.00000    A    MYL    2         2013  2
#  4:  3627.500        15.00000    A    MYL    2         2014  2
#  5:  8488.333        52.00000    C    TTC    3         2013  3
#  6:  8488.333        52.00000    C    TTC    3         2014  3
#  7:  8488.333        52.00000    C    TTC    3         2015  3
#  8: 11881.667        83.33333    C    ZAF    3         2013  4
#  9: 11881.667        83.33333    C    ZAF    3         2014  4
# 10: 11881.667        83.33333    C    ZAF    3         2015  4
# 11:  2556.000        12.00000    B    CAN    1         2013  5
# 12:  3665.000        16.00000    B    HLT    1         2013  6

Upvotes: 2

Related Questions