Timothy Alston
Timothy Alston

Reputation: 1549

How to prevent reshape() from changing variable names

I'm having some problems with the reshape() function. Once I have reshaped, it is changing all my variable names to be "value.var 1" for example.

The code I am using to reshape is:

test<- reshape(cdc_city, idvar= "site", timevar="variable", 
               direction="wide", new.row.names=FALSE)

I am guessing I am using the "new.row.names=FALSE" part wrong. I have attempted substituting FALSE for NULL, but that also didn`t work.

My aim is to transform a data set as follows:

site    variable    value
site 1  var 1       4
site 1  var 2       7
site 1  var 3       2
site 1  var 4       6
site 1  var 5       3
site 2  var 1       89
site 2  var 2       43
site 2  var 3       12
site 2  var 4       54
site 2  var 5       23
site 3  var 1       76
site 3  var 2       62
site 3  var 3       13
site 3  var 4       43
site 3  var 5       23

into a data set like this:

site    var 1    var 2    var 3    var 4    var 5
1       4        7        2        6        3
2       89       43       12       54       23
3       76       62       13       43       23

If anyone knows how to get rid of the "value." appearing at the beginning of my variable names, that would be great! Or if there is a better bit of code I could use I am very open to that also.

Thanks,

Timothy

Upvotes: 2

Views: 3317

Answers (3)

Dave
Dave

Reputation: 2526

Yes, this can be done with built-in base::reshape().

For direction=="wide", you can rename reshaped variables using a list with the varying argument. As you have discovered, reshape will generate a column name that looks like `v.name`.`times`. R documentation erroneously suggests that varying accepts a vector of names (it does for direction=="long").

cdc_city <- data.frame(
    site=paste("site", rep(1:3, each=5)),
    variable=paste("var", rep(1:5, 3)), 
    value=c(4,7,2,6,3,89,43,12,54,23,76,62,13,43,23))

#       site variable value
#  1  site 1    var 1     4
#  2  site 1    var 2     7
#  3  site 1    var 3     2
#  4  site 1    var 4     6
#  5  site 1    var 5     3
#  6  site 2    var 1    89
#  7  site 2    var 2    43
#  8  site 2    var 3    12
#  9  site 2    var 4    54
#  10 site 2    var 5    23
#  11 site 3    var 1    76
#  12 site 3    var 2    62
#  13 site 3    var 3    13
#  14 site 3    var 4    43
#  15 site 3    var 5    23

test <- reshape(cdc_city,
    varying=list(c("var 1", "var 2", "var 3", "var 4", "var 5")),
    idvar= "site", timevar="variable", direction="wide")

#       site var 1 var 2 var 3 var 4 var 5
#  1  site 1     4     7     2     6     3
#  6  site 2    89    43    12    54    23
#  11 site 3    76    62    13    43    23

Be careful about data.frames where strings are factors. You can specify varying from a factor in the data.frame directly using

reshape( ..., varying=list(as.character(unique(cdc_city$variable))), ...)

Upvotes: 3

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193537

For reshaping like this, xtabs is also useful:

xtabs(value ~ site + variable, data = cdc_city)
#         variable
# site     var 1 var 2 var 3 var 4 var 5
#   site 1     4     7     2     6     3
#   site 2    89    43    12    54    23
#   site 3    76    62    13    43    23

Regarding your question, though, I don't actually see this as a "problem". That's exactly how it is supposed to work.

Consider the following scenario. You have another column in your data.frame (we'll call it "other") and you want to reshape that whole dataset from long to wide.

Here's some sample data and output:

set.seed(1)
cdc_city$other = sample(1:20, 15, replace=TRUE)
reshape(cdc_city, direction="wide", 
        idvar="site", timevar="variable")
#      site value.var 1 other.var 1 value.var 2 other.var 2 value.var 3
# 1  site 1           4           6           7           8           2
# 6  site 2          89          18          43          19          12
# 11 site 3          76           5          62           4          13
#    other.var 3 value.var 4 other.var 4 value.var 5 other.var 5
# 1           12           6          19           3           5
# 6           14          54          13          23           2
# 11          14          43           8          23          16

In my opinion, tacking on the value and other to the variable names is essential in this case.

Finally, your new.row.names argument is useless here because your unhappiness is with the column names, not row names.

Update

Since I'm in the mood for sharing alternatives, you may also be interested in exploring aggregate, which also has fairly easy to understand syntax:

aggregate(list(var = cdc_city$value), 
          list(site = cdc_city$site), c)
#     site var.1 var.2 var.3 var.4 var.5
# 1 site 1     4     7     2     6     3
# 2 site 2    89    43    12    54    23
# 3 site 3    76    62    13    43    23

If you use aggregate, you also have some control over the names of the resulting variables.

Upvotes: 4

Andrie
Andrie

Reputation: 179448

I suggest you use the reshape2 package instead - it's much easier to use melt and dcast than the built-in reshape():

library(reshape2)
dcast(cdc_city, site~variable, mean)
    site var 1 var 2 var 3 var 4 var 5
1 site 1     4     7     2     6     3
2 site 2    89    43    12    54    23
3 site 3    76    62    13    43    23

Upvotes: 2

Related Questions