Reputation: 6149
I have an R DataFrame of data with customer reviews, where the auditor has put in multiple reason codes by copying the whole review, and inserting each reason code in a new row. Here's what I have:
Item Category Reason Review
Vacuum Performance Bad Suction I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Vacuum Design Cord is too short I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Vacuum Color Wrong Color I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Boat Size too big The boat was way too big, and was slow.
Boat Performance slow The boat was way too big, and was slow.
Tube Inflation low inflation The tube was not inflated enough
I'm looking to group it by the shared columns (Item and Review) and create category and reason columns for the multiple reasons and categories. Let's assume ahead of time that I don't know the number of unique reasons and categories per item ahead of time, as I'm showing you dummy data.
So, what I would want is this:
Item Category.1 Category.2 Category.3 Reason.1 Reason.2 Reason.3 Review
Vacuum Performance Design Color Bad Suction Cord is too short Wrong Color I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Boat Size Performance NA too big slow NA The boat was way too big, and was slow.
Tube Inflation NA NA low inflation NA NA The tube was not inflated enough
I tried using the following code to no avail:
reshape(data, direction = "wide",
idvar = c("Item", "Review" ),
timevar = c("Category", "Reason"))
Here's the data:
dput(Data)
structure(list(Item = c("Vacuum", "Vacuum", "Vacuum", "Boat",
"Boat", "Tube"), Category = c("Performance", "Design",
"Color", "Size", "Performance", "Inflation"
), Reason = c("Bad Suction", "Cord is too short", "Wrong Color",
"too big", "slow", "low inflation"), Review = c("I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.",
"I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.",
"I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.",
"The boat was way too big, and was slow.", "The boat was way too big, and was slow.",
"The tube was not inflated enough")), .Names = c("Item", "Category",
"Reason", "Review"), class = "data.frame", row.names = c(NA,
-6L))
Upvotes: 0
Views: 180
Reputation: 193547
You just need to create a "time" variable from your "item" column:
Data$UniqueReview <- ave(Data$Item, Data$Item, FUN = seq_along)
out <- reshape(Data, direction = "wide", idvar="Item", timevar="UniqueReview")
names(out)
# [1] "Item" "Category.1" "Reason.1" "Review.1" "Category.2" "Reason.2"
# [7] "Review.2" "Category.3" "Reason.3" "Review.3"
Here are the "category" and "reason" columns from the resulting "wide" dataset (just so it fits on the screen).
out[, grep("Item|Category|Reason", names(out))]
# Item Category.1 Reason.1 Category.2 Reason.2 Category.3 Reason.3
# 1 Vacuum Performance Bad Suction Design Cord is too short Color Wrong Color
# 4 Boat Size too big Performance slow <NA> <NA>
# 6 Tube Inflation low inflation <NA> <NA> <NA> <NA>
Also, library(reshape)
does not refer to the inbuilt reshape
function that you're trying to use. Instead, that's the old version of the "reshape2" package.
Rereading your question and your comment, since you can assume that the "Review" column can be treated as an ID column of its own, just change the reshape
command accordingly:
reshape(Data, direction = "wide", idvar=c("Item", "Review"), timevar="UniqueReview")
# Item
# 1 Vacuum
# 4 Boat
# 6 Tube
# Review
# 1 I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
# 4 The boat was way too big, and was slow.
# 6 The tube was not inflated enough
# Category.1 Reason.1 Category.2 Reason.2 Category.3 Reason.3
# 1 Performance Bad Suction Design Cord is too short Color Wrong Color
# 4 Size too big Performance slow <NA> <NA>
# 6 Inflation low inflation <NA> <NA> <NA> <NA>
Upvotes: 1