Reputation: 41

Merge two rows into one based on two identifiers

I have a dataset that contains the information on the bought and sold prices of different products. However, rather than storing the bought and sold for price in the same row, it is stored in two separate rows, which are identified by a bought and sold variable, as shown below.

Product|Product Type|Price|Bought|Sold
---------------------------------------
Apples |   Green    |  1  |   0  |  1
---------------------------------------
Apples |   Green    |  2  |   1  |  0
---------------------------------------
Apples |   Red      |  3  |   0  |  1
---------------------------------------
Apples |   Red      |  4  |   1  |  0
---------------------------------------

I want to join the bought and sold price into one row, so it looks a little something like this:

Product|Product Type|Bought Price|Sold Price
---------------------------------------------
Apples |   Green    |      1     |    2
---------------------------------------------
Apples |   Red      |      4     |    3

Here is the code to create my example dataset. Thanks in advance for any help.

Product <- c("Apples", "Apples", "Apples", "Apples", "Apples", "Apples",
             "Oranges", "Oranges", "Oranges", "Oranges", "Oranges", "Oranges",
             "Buscuits", "Buscuits", "Buscuits", "Buscuits", "Buscuits", "Buscuits")
ProductType <- c("Green", "Green", "Red", "Red", "Pink", "Pink",
                 "Big", "Big", "Medium", "Medium", "Small", "Small",
                 "Chocolate", "Chocolate", "Oat", "Oat", "Digestive", "Digestive")


Price <- c(2, 1, 3, 4, 1, 2,
           5, 3, 2, 1, 2, 3,
           6, 4, 1, 8, 6, 2)

Bought <- c(0, 1, 0, 1, 0, 1,
            0, 1, 0, 1, 0, 1,
            0, 1, 0, 1, 0, 1)

Sold <- c(1, 0, 1, 0, 1, 0,
          1, 0, 1, 0, 1, 0,
          1, 0, 1, 0, 1, 0)

sales <- data.frame(Product, ProductType, Price, Bought, Sold)

Upvotes: 2

Answers (3)

akrun

Reputation: 887118

With dplyr, we group by 'Product', 'ProductType', and summarise to create the 'BoughtPrice' and 'SoldPrice' by subsetting 'Price' where 'Bought' or 'Sold' is 1

library(dplyr)
sales %>% 
     group_by(Product, ProductType) %>% 
     summarise(BoughtPrice = Price[Bought==1], SoldPrice = Price[Sold ==1])

A similar approach with data.table would be

library(data.table)
setDT(sales)[, lapply(.SD, function(x) Price[x==1]),
                   .(Product, ProductType), .SDcols = Bought:Sold]

Upvotes: 2

kangaroo_cliff

Reputation: 6222

library(dplyr)
df <- data.frame(Product, ProductType, Price, Bought, Sold)
df %>% group_by(Product, ProductType) %>% 
  summarise(Bought_Price = sum(Price * Bought), 
            Sold_Price = sum(Sold * Price))

# A tibble: 9 x 4
# Groups:   Product [?]
# Product ProductType Bought_Price Sold_Price
# <fctr>      <fctr>        <dbl>      <dbl>
#   1   Apples       Green            1          2
# 2   Apples        Pink            2          1
# 3   Apples         Red            4          3
# 4 Buscuits   Chocolate            4          6
# 5 Buscuits   Digestive            2          6
# 6 Buscuits         Oat            8          1
# 7  Oranges         Big            3          5
# 8  Oranges      Medium            1          2
# 9  Oranges       Small            3          2

Upvotes: 3

zx8754

Reputation: 56159

Using dplyr:

library(dplyr)

sales %>% 
  group_by(Product, ProductType) %>% 
  summarise(BoughtPrice = Price[ Bought == 1 ],
            SoldPrice = Price[ Sold == 1 ]) %>% 
  ungroup()

Upvotes: 4

Merge two rows into one based on two identifiers

Answers (3)

Related Questions