Chris
Chris

Reputation: 125

Splitting a column in an R dataframe based on a condition in another column of same dataframe

I have this dataframe:

df <- data.frame(item_number = c(1, 2, 3, 3),
                 description = c("Big Wheel Agriculture Cart, 300-Lb Capacity, 32.75w X 58d X 28.25h, Black", "Dome Lids For 16\" Cater Trays, 16\" Diameter X 2.5\"h, Clear, 50/carton", "Aluminum Cater Trays, 1 Compartment, 18\" Diameter X 0.94\"h, Silver, 50/carton", "Aluminum Cater Trays, 7 Compartment Lazy Susan, 18\" Diameter X 0.94\"h, Silver, 50/carton"))

I want to split the description column (using commas as the delimiter) based on the category number. I want to split category_number 1 into 4 columns (product name, capacity, item_dimensions, color), but for category number 3 I want split into 5 columns (product_name, compartments, item dimensions, color, quantity). I want these new columns to be attached to the original df dataframe.

I tried case_when and if_else but I am unable to conditionally split as I described above. I am not sure if what I want to do is possible.

EDITED:

My desired df output from splitting:

df <- data.frame(item_number = c(1, 2, 3, 3),
                 description = c("Big Wheel Agriculture Cart, 300-Lb Capacity, 32.75w X 58d X 28.25h, Black", "Dome Lids For 16\" Cater Trays, 16\" Diameter X 2.5\"h, Clear, 50/carton", "Aluminum Cater Trays, 1 Compartment, 18\" Diameter X 0.94\"h, Silver, 50/carton", "Aluminum Cater Trays, 7 Compartment Lazy Susan, 18\" Diameter X 0.94\"h, Silver, 50/carton"),
                 product_name = c("Big Wheel Agriculture Cart", "Dome Lids For 16\" Cater Trays",
                                  "Aluminum Cater Trays", "Aluminum Cater Trays"),
                 capacity = c("300-Lb Capacity", NA, NA, NA),
                 item_dimensions = c("32.75w X 58d X 28.25h", "16\" Diameter X 2.5\"h",
                                     "18\" Diameter X 0.94\"h", "18\" Diameter X 0.94\"h"),
                 color = c("Black", "Clear", "Silver", "Silver"),
                 quantity = c(NA, "50/carton", "50/carton", "50/carton"),
                 compartment = c(NA, NA, "1 Compartment", "7 Compartment Lazy Susan"))

Upvotes: 0

Views: 928

Answers (1)

Evgeniia
Evgeniia

Reputation: 21

I see the solution in:

  1. specifying column names in the right order for each item group you have (items_list)
  2. split data frame according to the unique item groups into several small data frames (split_df)
  3. then split the description column for each item in the right way (with separate function)
  4. finally, merge all small dataframe into the big one (bind_rows function)
library(tidyverse)

df <- data.frame(item_number = c(1, 2, 3, 3),
                 description = c("Big Wheel Agriculture Cart, 300-Lb Capacity, 32.75w X 58d X 28.25h, Black", "Dome Lids For 16\" Cater Trays, 16\" Diameter X 2.5\"h, Clear, 50/carton", "Aluminum Cater Trays, 1 Compartment, 18\" Diameter X 0.94\"h, Silver, 50/carton", "Aluminum Cater Trays, 7 Compartment Lazy Susan, 18\" Diameter X 0.94\"h, Silver, 50/carton"))


items_list <- list(
  c("product_name", "capacity", "item_dimensions", "color"),
  c("product_name", "item_dimensions", "color", "quantity"),
  c("product_name", "compartment", "item_dimensions", "color", "quantity")
)

split_df <- split(df, df$item_number)

new_df <- bind_rows(
  
  lapply(split_df, function(df_x) {
    item_n <- df_x$item_number[[1]]
    df_x %>% 
      mutate(description_split = description) %>% 
      separate(col = "description_split",
               into = items_list[[item_n]],
               sep = ", ")
  })
)

Upvotes: 1

Related Questions