coip
coip

Reputation: 1510

R reshape wide to long, using objects for the inputs

I have a list of data frames in wide format, with a factor variable in Column 1 and annualized data in Columns 2 onward. I want to graph these data. Doing so requires reshaping them into long format. Here is an example of one data frame:

# SAMPLE DATA
x <- structure(list(State = structure(1:3, .Label = c("Alabama", "Alaska", "Arizona", "Arkansas"), class = "factor"), Green.And.Blue.Score.2001 = c(0L, 40L, 65L), Green.And.Blue.Score.2002 = c(20L, 5L, 60L), Green.And.Blue.Score.2003 = c(35L, 15L, 30L)), .Names = c("State", "Green.And.Blue.Score.2001", "Green.And.Blue.Score.2002", "Green.And.Blue.Score.2003"), row.names = c(NA, 3L), class = "data.frame")

x
#    State Green.And.Blue.Score.2001 Green.And.Blue.Score.2002 Green.And.Blue.Score.2003
#1 Alabama          0                          20                         35
#2  Alaska         40                           5                         15
#3 Arizona         65                          60                         30

I usually do this using reshape(). For instance, this works fine:

# RESHAPE WIDE TO LONG (MANUALLY)
y <- reshape(x, 
    idvar = 'State',
  varying = c('Green.And.Blue.Score.2001', 'Green.And.Blue.Score.2002', 'Green.And.Blue.Score.2003'),
  v.names = 'Green.And.Blue.Score.',
    times = c('2001', '2002', '2003'),
direction = 'long')

y
# State    time   Green.And.Blue.Score.
# Alabama  2001      0
# Alaska   2001     40
# Arizona  2001     65
# Alabama  2002     20
# Alaska   2002      5
# Arizona  2002     60
# Alabama  2003     35
# Alaska   2003     15
# Arizona  2003     30

However, I didn't want to manually have to input the idvar, varying, v.name, and times variables for the dozens of data frames I have, and because some of the column names are quite long and complex, and vary considerably from data frame to data frame, simple reshape() commands can't automatically deparse them. My thought was to create a function to obtain those inputs from the data frame, the precursor of which is as follows:

# RESHAPE WIDE TO LONG (FUNCTIONALIZED)
id <- noquote(paste("'", names(x[1]), "'", sep = ""))
va <- noquote(paste("c('", paste(names(x)[2:length(x)], collapse = "', '"), "')", sep = ""))
vn <- noquote(paste("'", sub("(\\..*)$", ".", names(x)[2]) , "'", sep = ""))
ti <- noquote(paste("c('", paste(sub(".*(\\d{4})$", "\\1", names(x[2:length(x)])), collapse = "', '"), "')", sep = ""))

The output of each of those matches the idvar, varying, v.name, and times inputs from #RESHAPE WIDE TO LONG (MANUALLY) above:

id
# 'State'
va
# c(''Green.And.Blue.Score.2001', ''Green.And.Blue.Score.2002', ''Green.And.Blue.Score.2003')
vn
# ''Green.And.Blue.Score.'
ti
# c('2001', '2002', '2003')

But, when I try to use those objects in the reshape() function, I get an error message:

y <- reshape(x, 
    idvar = id,
  varying = va,
  v.names = vn,
    times = ti,
direction = 'long')

Error in [.data.frame(data, , varying[[i]][1L]) : undefined columns selected

I'm sure my solution to 'functionalize' reshape() is not ideal. What should I be doing instead?

Upvotes: 0

Views: 144

Answers (1)

IRTFM
IRTFM

Reputation: 263311

The effort to put quotes around the material extracted from names is causing the error. This is a simplification of that code. Note that I removed the v.names and times since these are automagically calculated when the column names are properly separated by "."

y <- reshape(x, 
     idvar = names(x)[1],
   varying = names(x)[-1],

 direction = 'long')
 y
#-----    
           State time Score
Alabama.2001 Alabama 2001     0
Alaska.2001   Alaska 2001    40
Arizona.2001 Arizona 2001    65
Alabama.2002 Alabama 2002    20
Alaska.2002   Alaska 2002     5
Arizona.2002 Arizona 2002    60
Alabama.2003 Alabama 2003    35
Alaska.2003   Alaska 2003    15
Arizona.2003 Arizona 2003    30

If we use this on your new example we can get a "split" at the ".S" that give a reasonable result. The column name text between the first period and the "split" pattern gets moved to the column name while the leading state name and year get appeanded together as a rowname:

 y <- reshape(x, 
      idvar = names(x)[1],
    varying = names(x)[-1],
    split = list(regexp = "\\.S", include = TRUE),
  direction = 'long')
  y

                     State       time Green.And.Blue.
Alabama.Score.2001 Alabama Score.2001               0
Alaska.Score.2001   Alaska Score.2001              40
Arizona.Score.2001 Arizona Score.2001              65
Alabama.Score.2002 Alabama Score.2002              20
Alaska.Score.2002   Alaska Score.2002               5
Arizona.Score.2002 Arizona Score.2002              60
Alabama.Score.2003 Alabama Score.2003              35
Alaska.Score.2003   Alaska Score.2003              15
Arizona.Score.2003 Arizona Score.2003              30

Upvotes: 2

Related Questions