Bryan
Bryan

Reputation: 6149

Reshaping R dataframe where unique key is a new row

I have an R DataFrame of data with customer reviews, where the auditor has put in multiple reason codes by copying the whole review, and inserting each reason code in a new row. Here's what I have:

Item    Category        Reason                 Review  
Vacuum  Performance     Bad Suction            I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Vacuum  Design          Cord is too short      I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Vacuum  Color           Wrong Color            I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Boat    Size            too big                The boat was way too big, and was slow.
Boat    Performance     slow                   The boat was way too big, and was slow.
Tube    Inflation       low inflation          The tube was not inflated enough

I'm looking to group it by the shared columns (Item and Review) and create category and reason columns for the multiple reasons and categories. Let's assume ahead of time that I don't know the number of unique reasons and categories per item ahead of time, as I'm showing you dummy data.

So, what I would want is this:

Item    Category.1    Category.2   Category.3  Reason.1       Reason.2           Reason.3      Review  
Vacuum  Performance   Design       Color       Bad Suction    Cord is too short  Wrong Color   I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
Boat    Size          Performance    NA        too big        slow               NA            The boat was way too big, and was slow.
Tube    Inflation     NA             NA        low inflation  NA                 NA            The tube was not inflated enough

I tried using the following code to no avail:

reshape(data, direction = "wide", 
        idvar = c("Item", "Review" ), 
        timevar = c("Category", "Reason"))

Here's the data:

dput(Data)
structure(list(Item = c("Vacuum", "Vacuum", "Vacuum", "Boat", 
"Boat", "Tube"), Category = c("Performance", "Design", 
"Color", "Size", "Performance", "Inflation"
), Reason = c("Bad Suction", "Cord is too short", "Wrong Color", 
"too big", "slow", "low inflation"), Review = c("I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.", 
"I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.", 
"I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.", 
"The boat was way too big, and was slow.", "The boat was way too big, and was slow.", 
"The tube was not inflated enough")), .Names = c("Item", "Category", 
"Reason", "Review"), class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 0

Views: 180

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193547

You just need to create a "time" variable from your "item" column:

Data$UniqueReview <- ave(Data$Item, Data$Item, FUN = seq_along)
out <- reshape(Data, direction = "wide", idvar="Item", timevar="UniqueReview")
names(out)
#  [1] "Item"       "Category.1" "Reason.1"   "Review.1"   "Category.2" "Reason.2"  
#  [7] "Review.2"   "Category.3" "Reason.3"   "Review.3" 

Here are the "category" and "reason" columns from the resulting "wide" dataset (just so it fits on the screen).

out[, grep("Item|Category|Reason", names(out))]
#     Item  Category.1      Reason.1  Category.2          Reason.2 Category.3    Reason.3
# 1 Vacuum Performance   Bad Suction      Design Cord is too short      Color Wrong Color
# 4   Boat        Size       too big Performance              slow       <NA>        <NA>
# 6   Tube   Inflation low inflation        <NA>              <NA>       <NA>        <NA>

Also, library(reshape) does not refer to the inbuilt reshape function that you're trying to use. Instead, that's the old version of the "reshape2" package.


Rereading your question and your comment, since you can assume that the "Review" column can be treated as an ID column of its own, just change the reshape command accordingly:

reshape(Data, direction = "wide", idvar=c("Item", "Review"), timevar="UniqueReview")
#     Item
# 1 Vacuum
# 4   Boat
# 6   Tube
#                                                                                        Review
# 1 I bought the vacuum. The suction was bad, the cord is too short, and it is the wrong color.
# 4                                                     The boat was way too big, and was slow.
# 6                                                            The tube was not inflated enough
#    Category.1      Reason.1  Category.2          Reason.2 Category.3    Reason.3
# 1 Performance   Bad Suction      Design Cord is too short      Color Wrong Color
# 4        Size       too big Performance              slow       <NA>        <NA>
# 6   Inflation low inflation        <NA>              <NA>       <NA>        <NA>

Upvotes: 1

Related Questions