oalsing
oalsing

Reputation: 100

Summarize specific set of rows

I want to summarise the data of each item (multiple rows) in order to remove the items that haven't had any type of promotion. Please see the example data.

item_code vendor_code launch_month unit_price department_name  category_name
1  I-111164         V10     2007.M01        118            Face Face Treatment
2  I-111164         V10     2007.M01        118            Face Face Treatment
3  I-111164         V10     2007.M01        118            Face Face Treatment
4  I-111164         V10     2007.M01        118            Face Face Treatment
5  I-111164         V10     2007.M01        118            Face Face Treatment
6  I-111164         V10     2007.M01        118            Face Face Treatment
      subcategory_name sales_velocity sales_month sales_unit      promotion_type
1 Face Treatment Other              B    2008.M01   41.00000        no_promotion
2 Face Treatment Other              B    2008.M02   55.00000        no_promotion
3 Face Treatment Other              B    2008.M03   64.80000 Catalogue Promotion
4 Face Treatment Other              B    2008.M04   46.00000        no_promotion
5 Face Treatment Other              B    2008.M05   67.00000        no_promotion
6 Face Treatment Other              B    2008.M06   58.40000 Catalogue Promotion
> 

What would be the best practice way to do this in R?

Upvotes: 0

Views: 66

Answers (3)

Dagremu
Dagremu

Reputation: 325

First you want to get the item codes for all items that ever got a promotion. Assuming your data frame is called df, use

got.promoted <- df$item_code[df$promotion_type != "no_promotion"]

This will be a vector with the right codes. It may contain duplicates, which you can get rid of with

got.promoted <- unique(got.promoted)

Then use this vector to select, from the original data frame, the items that got promotions:

new.df <- df[df$item_code %in% got.promoted, ]

I won't claim that this is the "best practice" way, but it should work and is easy to understand.

Upvotes: 0

jazzurro
jazzurro

Reputation: 23574

library(dplyr)
item_code <- rep(c("I-111164"), each = 1, times = 6)
vendor_code <- rep(c("V10"), each = 1, times = 6)
launch_month <- rep(c("2007.M01"), each = 1, times = 6)
unite_price <- rep(c("118"), each = 1, times = 6)
department_name <- rep(c("Face"), each = 1, times = 6)
category_name <- rep(c("Face Treatment"), each = 1, times = 6)
subcategory_name <- rep(c("Face Treatment other"), each = 1, times = 6)
sales_velocity <- rep(c("B"), each = 1, times = 6)
sales_month <- rep(c("2008.M01"), each = 1, times = 6)
sales_unit <- rep(c(41,55,64,46,67,58), each = 1, times = 1)
promotion_type <- c("no_promotion", "no_promotion", "catalogue promotion",
                "no_promotion", "no_promotion", "catalogue promotion")

# Create the data frame             
foo <- data.frame(item_code, vendor_code, launch_month, unite_price, department_name,
category_name, subcategory_name, sales_velocity, sales_month,sales_unit, promotion_type,    stringsAsFactors = F)

# Remove all rows with 'no_promotion'
foo2 <- filter(foo, promotion_type != "no_promotion")

# Get mean of sales unit for each item code
america <- foo2 %>%
       group_by(item_code) %>%
       summarize(sales = mean(sales_unit))

america

Upvotes: 0

Sven Hohenstein
Sven Hohenstein

Reputation: 81753

The following command returns a data frame without the rows that haven't had any type of promotion:

dat[dat$promotion_type != "no_promotion", ]

where dat is the name of your data frame.

Upvotes: 1

Related Questions