Reputation:
I want to merge rows together that have the same word in another column. The table entries are all comma separated strings (characters), not lists. So as shown below the color shades for the same color should be added in a string in one row, instead of spanning over several rows. Also there should be no duplicates in the color shades column.
I have already tried with:
aggregate(df["Color shades"], df["Color"], paste, collapse=", ")
As well as with:
aggregate(Color shades ~ Color ,df ,toString)
But that didn't lead to the desired result.
Dataframe:
df <- data.frame(colorshades = c("turquoise, babyblue", "royal blue, true blue",
"navy blue, true blue"), colors = c("blue", "blue", "blue"))
Currently:
Color shades | Color |
---|---|
turquoise, babyblue | blue |
royal blue, true blue | blue |
navy blue, true blue | blue |
Desired Output:
Color shades | Color |
---|---|
turquoise, babyblue, royal blue, true blue, navy blue | blue |
Upvotes: 1
Views: 59
Reputation: 973
One can also use tidytext
to with unnest
library(dplyr)
library(tidytext)
color_df <- tibble(color= rep("blue", times = 3),
color_shades = c("turquoise, babyblue", "royal blue, true blue", "navy blue, true blue"))
color_shades_agg <- color_df %>%
unnest_tokens(word, color_shades, token = 'regex', pattern=", ") %>%
group_by(color) %>%
distinct() %>%
summarise(color_shades = paste0(sort(word), collapse = ", "))
Upvotes: 0
Reputation: 27732
data.table
solution
library(data.table)
setDT(df)[, .(Color_shades = paste0(unique(unlist(strsplit(colorshades, ", "))),
collapse = ", ")),
by = .(colors)]
# colors Color_shades
# 1: blue turquoise, babyblue, royal blue, true blue, navy blue
Upvotes: 0
Reputation: 351
if you can use the library "dplyr" you can also do it this way :
library(dplyr)
df <- data.frame("Colorshade" = c("turquoise, babyblue", "royal blue, true blue", "navy blue, true blue"),
"Color" = c(rep("blue", 3)),
stringsAsFactors = FALSE)
my_df <- df %>% group_by(Color) %>% mutate(Colorshade = paste(unique(sort(str_split(string = paste(df$Colorshade, collapse = ", "), pattern = ", ", simplify = TRUE))), collapse = ", ")) %>% first()
Upvotes: 0
Reputation: 160447
Convert "Color shades"
to a list-column:
lapply(strsplit(df[["Color shades"]], ","), trimws)
# [[1]]
# [1] "turquoise" "babyblue"
# [[2]]
# [1] "royal blue" "true blue"
# [[3]]
# [1] "navy blue" "true blue"
df[["Color shades"]] <- lapply(strsplit(df[["Color shades"]], ","), trimws)
df
# Color shades Color
# 1 turquoise, babyblue blue
# 2 royal blue, true blue blue
# 3 navy blue, true blue blue
Aggregate with unique
:
aggregate(df["Color shades"], df["Color"], function(z) paste(unique(unlist(z)), collapse=", "))
# Color Color shades
# 1 blue turquoise, babyblue, royal blue, true blue, navy blue
or, keeping with the list-column method,
aggregate(df["Color shades"], df["Color"], function(z) list(unique(unlist(z))))
# Color Color shades
# 1 blue turquoise, babyblue, royal blue, true blue, navy blue
str(aggregate(df["Color shades"], df["Color"], function(z) list(unique(unlist(z)))))
# 'data.frame': 1 obs. of 2 variables:
# $ Color : chr "blue"
# $ Color shades:List of 1
# ..$ : chr "turquoise" "babyblue" "royal blue" "true blue" ...
There are often (but not always) advantages to dealing with list-columns vice comma-separated values. If your use-case is such that you frequently want to look at individual elements within one of these fields, you'll find yourself dealing deeply with regexes and/or repeatedly using strsplit
to separate. With list-columns, one can use tools like unique
and %in%
with abandon (though admittedly one should really become more comfortable with lapply
/sapply
, and many base-R tools for aggregation do not always work consistently with list-columns).
Data
df <- structure(list(`Color shades` = c("turquoise, babyblue", "royal blue, true blue", "navy blue, true blue"), Color = c("blue", "blue", "blue")), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 4