Reputation: 389
I have a dataframe inputData with columns acc_id, attrName, attrValue and timeStamp. The acc_id column will have duplicate entries because an example head of the data will be like this:
acc_id, attr_name, attr_value, timeStamp
123 abcVar1 138 Nov 1 2017
123 abcVar2 10 Nov 1 2017
123 abcVar3 2124 Nov 1 2017
468 abcVar1 13 Nov 9 2017
468 abcVar2 13458 Nov 9 2017
468 abcVar3 1238 Nov 9 2017
The situation can be generalized where attr_name will have a lot more than just 3 unique variables as well.
It needs to be converted into the format
acc_id abcVar1 abcVar2 abcVar3 timeStamp
123 138 10 2124 Nov 1 2017
468 13 13458 1238 Nov 9 2017
I tried spread(inputData,attr_name, attr_val) to get error
Upvotes: 2
Views: 2949
Reputation: 16121
Based on your example above I get this:
library(tidyr)
df = read.table(text = "
acc_id attr_name attr_value timeStamp
123 abcVar1 138 Nov_1_2017
123 abcVar2 10 Nov_1_2017
123 abcVar3 2124 Nov_1_2017
468 abcVar1 13 Nov_9_2017
468 abcVar2 13458 Nov_9_2017
468 abcVar3 1238 Nov_9_2017
", header=T, stringsAsFactors=F)
spread(df, attr_name, attr_value)
# acc_id timeStamp abcVar1 abcVar2 abcVar3
# 1 123 Nov_1_2017 138 10 2124
# 2 468 Nov_9_2017 13 13458 1238
Maybe your example dataframe is not representative of your actual dataset. I cannot replicate your problem.
However, it seems to break when there are duplicate rows like this:
library(tidyr)
library(dplyr)
df = read.table(text = "
acc_id attr_name attr_value timeStamp
123 abcVar1 138 Nov_1_2017
123 abcVar2 10 Nov_1_2017
123 abcVar3 2124 Nov_1_2017
468 abcVar1 13 Nov_9_2017
468 abcVar1 13 Nov_9_2017
468 abcVar2 13458 Nov_9_2017
468 abcVar3 1238 Nov_9_2017
", header=T, stringsAsFactors=F)
spread(df, attr_name, attr_value)
# Error: Duplicate identifiers for rows (4, 5)
# deduplicate and reshape
df %>%
distinct() %>%
spread(attr_name, attr_value)
# acc_id timeStamp abcVar1 abcVar2 abcVar3
# 1 123 Nov_1_2017 138 10 2124
# 2 468 Nov_9_2017 13 13458 1238
Upvotes: 2