jbest
jbest

Reputation: 640

Creating New Data Frame in R

I have data in this format in R

customer_key    item_key    units
2669699            16865    1.00
2669699            16866    1.00
2669699            46963    2.00
2685256            55271    1.00
2685256            43458    1.00
2685256            54977    1.00
2685256             2533    1.00
2685256            55011    1.00
2685256            44785    2.00

but I want to get the unique head_key as column and I want my the other variables name be the unique values in item_key and their value would be the units like this

customer_key       '16865'   '16866'  '46963'  '55271'   '43458'   '54977'    '2533'
    2669699          1.00     1.00     1.00     0.00      0.00      0.00       0.00 
    2685256          0.00     0.00     0.00     1.00      1.00      1.00       2.00

Please help me transform my data for cluster analysis

Upvotes: 4

Views: 120

Answers (4)

jazzurro
jazzurro

Reputation: 23574

Here is one way.

library(tidyr)

spread(mydf,item_key, units, fill = 0)

#  customer_key 2533 16865 16866 43458 44785 46963 54977 55011 55271
#1      2669699    0     1     1     0     0     2     0     0     0
#2      2685256    1     0     0     1     2     0     1     1     1

Upvotes: 3

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

Since the packages have been covered (+1 to you all), here are a couple of base solutions to join the party:

xtabs:

xtabs(units ~ customer_key + item_key, df)
#             item_key
# customer_key 2533 16865 16866 43458 44785 46963 54977 55011 55271
#      2669699    0     1     1     0     0     2     0     0     0
#      2685256    1     0     0     1     2     0     1     1     1

reshape

reshape(df, direction = "wide", idvar = "customer_key", timevar = "item_key")
#   customer_key units.16865 units.16866 units.46963 units.55271
# 1      2669699           1           1           2          NA
# 4      2685256          NA          NA          NA           1
#   units.43458 units.54977 units.2533 units.55011 units.44785
# 1          NA          NA         NA          NA          NA
# 4           1           1          1           1           2

Upvotes: 3

David Arenburg
David Arenburg

Reputation: 92292

That's just a simple dcast task. Assuming df is your data set

library(reshape2)
dcast(df, customer_key ~ item_key , value.var = "units", fill = 0)
#   customer_key 2533 16865 16866 43458 44785 46963 54977 55011 55271
# 1      2669699    0     1     1     0     0     2     0     0     0
# 2      2685256    1     0     0     1     2     0     1     1     1

Upvotes: 3

KFB
KFB

Reputation: 3501

library(dplyr); library(tidyr)
df2 <- df %>% arrange(item_key) %>% spread(item_key, units, fill=0)
df2
#   customer_key 2533 16865 16866 43458 44785 46963 54977 55011 55271
# 1      2669699    0     1     1     0     0     2     0     0     0
# 2      2685256    1     0     0     1     2     0     1     1     1

Data

df <- structure(list(customer_key = c(2669699L, 2669699L, 2669699L, 
2685256L, 2685256L, 2685256L, 2685256L, 2685256L, 2685256L), 
    item_key = c(16865L, 16866L, 46963L, 55271L, 43458L, 54977L, 
    2533L, 55011L, 44785L), units = c(1, 1, 2, 1, 1, 1, 1, 1, 
    2)), .Names = c("customer_key", "item_key", "units"), class = "data.frame", row.names = c(NA, 
-9L))

Upvotes: 2

Related Questions