histelheim
histelheim

Reputation: 5088

Creating longitudinal datasets with reshape

I have the dataset:

top100_repository_name  month   monthly_increase    monthly_begin_at    monthly_end_with
Bukkit                  2012-03 9                   431                 440
Bukkit                  2012-04 19                  438                 457
Bukkit                  2012-05 19                  455                 474
CodeIgniter             2012-03 15                  492                 507
CodeIgniter             2012-04 50                  506                 556
CodeIgniter             2012-05 19                  555                 574

I use the following R code:

library(reshape)
latent.growth.data <- read.csv(file = "LGC_data.csv", header = TRUE)
melt(latent.growth.data, id = c("top100_repository_name", "month"), measured = c("monthly_end_with"))
cast(latent.growth.data, top100_repository_name + month ~ monthly_end_with)

Which I want to use to create a dataset that has the following structure:

top100_repository_name    2012-03    2012-04    2012-05
Bukkit                    440        457        474
CodeIgniter               507        556        574

However, when I run my code I get the following output:

Using monthly_end_with as value column.  Use the value argument to cast to override this choice
Error in `[.data.frame`(data, , variables, drop = FALSE) : 
  undefined columns selected

How can I modify my code so that I generate the desired output?

Upvotes: 2

Views: 466

Answers (2)

thelatemail
thelatemail

Reputation: 93833

Someone will be along soon with a plyr solution i'm sure, but here is a base solution using the reshape function.

test <- read.table(textConnection("top100_repository_name  month   monthly_increase    monthly_begin_at    monthly_end_with
Bukkit                  2012-03 9                   431                 440
Bukkit                  2012-04 19                  438                 457
Bukkit                  2012-05 19                  455                 474
CodeIgniter             2012-03 15                  492                 507
CodeIgniter             2012-04 50                  506                 556
CodeIgniter             2012-05 19                  555                 574"),header=TRUE)

Reshape this here data:

test2 <- reshape(
    test[c("top100_repository_name","month","monthly_end_with")],
    idvar="top100_repository_name",
    timevar="month",
    direction="wide"
)

Fix the names

names(test2) <- gsub("monthly_end_with.","",names(test2))

Which looks like:

> test2
  top100_repository_name 2012-03 2012-04 2012-05
1                 Bukkit     440     457     474
4            CodeIgniter     507     556     574

Upvotes: 6

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193527

Here's another pretty direct approach in base R. Use xtabs():

xtabs(monthly_end_with ~ top100_repository_name + month, test)
#                       month
# top100_repository_name 2012-03 2012-04 2012-05
#            Bukkit          440     457     474
#            CodeIgniter     507     556     574

as.data.frame.matrix(
  xtabs(monthly_end_with ~ top100_repository_name + month, test))
#             2012-03 2012-04 2012-05
# Bukkit          440     457     474
# CodeIgniter     507     556     574

Or, as indicated by @thelatemail, there's dcast from the "reshape2" package which can be used as follows:

dcast(test, top100_repository_name ~ month, value.var="monthly_end_with")
#   top100_repository_name 2012-03 2012-04 2012-05
# 1                 Bukkit     440     457     474
# 2            CodeIgniter     507     556     574

Upvotes: 4

Related Questions