Susan Mwansa
Susan Mwansa

Reputation: 51

How do I remove specific characters from a data frame

I have a raw data frame that i am cleaning up.

there's a column with thousands of rows that look like : c("round", "square", "triangle")

I would like the end result for each row to look like : round, square, triangle

help?

dput(ItemisedOrders[1:10, "Products", drop = FALSE])

structure(list(Products = list("Meatlovers Pizza", c("Supreme Pizza", 
"BBQ Chicken Pizza"), c("Seafood Pizza", "Vegetarian Pizza"), 
    c("Margherita Pizza", "Supreme Pizza", "Meatlovers Pizza"
    ), c("BBQ Chicken Pizza", "Hawaiian Pizza", "Meatlovers Pizza"
    ), c("Hawaiian Pizza", "Supreme Pizza"), c("Hawaiian Pizza", 
    "Pepperoni Pizza"), c("Seafood Pizza", "BBQ Chicken Pizza", 
    "Vegetarian Pizza", "Hawaiian Pizza"), "Pepperoni Pizza", 
    c("Margherita Pizza", "Supreme Pizza"))), row.names = c(NA, 
10L), class = "data.frame")

Upvotes: 0

Views: 99

Answers (2)

Gregor Thomas
Gregor Thomas

Reputation: 146164

Thanks for the dput - it looks like you have a list column - each row is a vector! We can use sapply to apply a function to each row, and luckily the toString function does what you want. Calling your data df:

df$Products = sapply(df$Products, toString)
df
# 1                                                    Meatlovers Pizza
# 2                                    Supreme Pizza, BBQ Chicken Pizza
# 3                                     Seafood Pizza, Vegetarian Pizza
# 4                   Margherita Pizza, Supreme Pizza, Meatlovers Pizza
# 5                 BBQ Chicken Pizza, Hawaiian Pizza, Meatlovers Pizza
# 6                                       Hawaiian Pizza, Supreme Pizza
# 7                                     Hawaiian Pizza, Pepperoni Pizza
# 8  Seafood Pizza, BBQ Chicken Pizza, Vegetarian Pizza, Hawaiian Pizza
# 9                                                     Pepperoni Pizza
# 10                                    Margherita Pizza, Supreme Pizza

Upvotes: 2

Paul Tansley
Paul Tansley

Reputation: 181

I don't really understand your question but if its in a dataframe and you just want to remove the quotation marks you could try:

your_data$coulmn_name <- gsub(""", "", your_data$coulmn_name)

Where what ever is inside the first "" is what gets replaced and whatever is in the second "" is what replaces it, by keeping them closed nothing will replace it

EDIT

I hadn't actually tried running it, I think because there symbols r uses as operators they are a bit more tricky. I've got it to work with:

df$Products <- gsub("\"", "", df$Products)
df$Products <- gsub("c", "", df$Products)
df$Products <- gsub("[()]", "", df$Products)

Upvotes: 0

Related Questions