Reputation: 1510
I have a list of data frames in wide format, with a factor variable in Column 1 and annualized data in Columns 2 onward. I want to graph these data. Doing so requires reshaping them into long format. Here is an example of one data frame:
# SAMPLE DATA
x <- structure(list(State = structure(1:3, .Label = c("Alabama", "Alaska", "Arizona", "Arkansas"), class = "factor"), Green.And.Blue.Score.2001 = c(0L, 40L, 65L), Green.And.Blue.Score.2002 = c(20L, 5L, 60L), Green.And.Blue.Score.2003 = c(35L, 15L, 30L)), .Names = c("State", "Green.And.Blue.Score.2001", "Green.And.Blue.Score.2002", "Green.And.Blue.Score.2003"), row.names = c(NA, 3L), class = "data.frame")
x
# State Green.And.Blue.Score.2001 Green.And.Blue.Score.2002 Green.And.Blue.Score.2003
#1 Alabama 0 20 35
#2 Alaska 40 5 15
#3 Arizona 65 60 30
I usually do this using reshape()
. For instance, this works fine:
# RESHAPE WIDE TO LONG (MANUALLY)
y <- reshape(x,
idvar = 'State',
varying = c('Green.And.Blue.Score.2001', 'Green.And.Blue.Score.2002', 'Green.And.Blue.Score.2003'),
v.names = 'Green.And.Blue.Score.',
times = c('2001', '2002', '2003'),
direction = 'long')
y
# State time Green.And.Blue.Score.
# Alabama 2001 0
# Alaska 2001 40
# Arizona 2001 65
# Alabama 2002 20
# Alaska 2002 5
# Arizona 2002 60
# Alabama 2003 35
# Alaska 2003 15
# Arizona 2003 30
However, I didn't want to manually have to input the idvar
, varying
, v.name
, and times
variables for the dozens of data frames I have, and because some of the column names are quite long and complex, and vary considerably from data frame to data frame, simple reshape()
commands can't automatically deparse them. My thought was to create a function to obtain those inputs from the data frame, the precursor of which is as follows:
# RESHAPE WIDE TO LONG (FUNCTIONALIZED)
id <- noquote(paste("'", names(x[1]), "'", sep = ""))
va <- noquote(paste("c('", paste(names(x)[2:length(x)], collapse = "', '"), "')", sep = ""))
vn <- noquote(paste("'", sub("(\\..*)$", ".", names(x)[2]) , "'", sep = ""))
ti <- noquote(paste("c('", paste(sub(".*(\\d{4})$", "\\1", names(x[2:length(x)])), collapse = "', '"), "')", sep = ""))
The output of each of those matches the idvar
, varying
, v.name
, and times
inputs from #RESHAPE WIDE TO LONG (MANUALLY)
above:
id
# 'State'
va
# c(''Green.And.Blue.Score.2001', ''Green.And.Blue.Score.2002', ''Green.And.Blue.Score.2003')
vn
# ''Green.And.Blue.Score.'
ti
# c('2001', '2002', '2003')
But, when I try to use those objects in the reshape()
function, I get an error message:
y <- reshape(x,
idvar = id,
varying = va,
v.names = vn,
times = ti,
direction = 'long')
Error in [.data.frame(data, , varying[[i]][1L]) : undefined columns selected
I'm sure my solution to 'functionalize' reshape()
is not ideal. What should I be doing instead?
Upvotes: 0
Views: 144
Reputation: 263311
The effort to put quotes around the material extracted from names is causing the error. This is a simplification of that code. Note that I removed the v.names and times since these are automagically calculated when the column names are properly separated by "."
y <- reshape(x,
idvar = names(x)[1],
varying = names(x)[-1],
direction = 'long')
y
#-----
State time Score
Alabama.2001 Alabama 2001 0
Alaska.2001 Alaska 2001 40
Arizona.2001 Arizona 2001 65
Alabama.2002 Alabama 2002 20
Alaska.2002 Alaska 2002 5
Arizona.2002 Arizona 2002 60
Alabama.2003 Alabama 2003 35
Alaska.2003 Alaska 2003 15
Arizona.2003 Arizona 2003 30
If we use this on your new example we can get a "split" at the ".S" that give a reasonable result. The column name text between the first period and the "split" pattern gets moved to the column name while the leading state name and year get appeanded together as a rowname:
y <- reshape(x,
idvar = names(x)[1],
varying = names(x)[-1],
split = list(regexp = "\\.S", include = TRUE),
direction = 'long')
y
State time Green.And.Blue.
Alabama.Score.2001 Alabama Score.2001 0
Alaska.Score.2001 Alaska Score.2001 40
Arizona.Score.2001 Arizona Score.2001 65
Alabama.Score.2002 Alabama Score.2002 20
Alaska.Score.2002 Alaska Score.2002 5
Arizona.Score.2002 Arizona Score.2002 60
Alabama.Score.2003 Alabama Score.2003 35
Alaska.Score.2003 Alaska Score.2003 15
Arizona.Score.2003 Arizona Score.2003 30
Upvotes: 2