Reputation: 1549
I'm having some problems with the reshape()
function. Once I have reshaped, it is changing all my variable names to be "value.var 1" for example.
The code I am using to reshape is:
test<- reshape(cdc_city, idvar= "site", timevar="variable",
direction="wide", new.row.names=FALSE)
I am guessing I am using the "new.row.names=FALSE" part wrong. I have attempted substituting FALSE for NULL, but that also didn`t work.
My aim is to transform a data set as follows:
site variable value
site 1 var 1 4
site 1 var 2 7
site 1 var 3 2
site 1 var 4 6
site 1 var 5 3
site 2 var 1 89
site 2 var 2 43
site 2 var 3 12
site 2 var 4 54
site 2 var 5 23
site 3 var 1 76
site 3 var 2 62
site 3 var 3 13
site 3 var 4 43
site 3 var 5 23
into a data set like this:
site var 1 var 2 var 3 var 4 var 5
1 4 7 2 6 3
2 89 43 12 54 23
3 76 62 13 43 23
If anyone knows how to get rid of the "value." appearing at the beginning of my variable names, that would be great! Or if there is a better bit of code I could use I am very open to that also.
Thanks,
Timothy
Upvotes: 2
Views: 3317
Reputation: 2526
Yes, this can be done with built-in base::reshape()
.
For direction=="wide"
, you can rename reshaped variables using a list with the varying
argument. As you have discovered, reshape
will generate a column name that looks like `v.name`.`times`
. R documentation erroneously suggests that varying
accepts a vector of names (it does for direction=="long"
).
cdc_city <- data.frame(
site=paste("site", rep(1:3, each=5)),
variable=paste("var", rep(1:5, 3)),
value=c(4,7,2,6,3,89,43,12,54,23,76,62,13,43,23))
# site variable value
# 1 site 1 var 1 4
# 2 site 1 var 2 7
# 3 site 1 var 3 2
# 4 site 1 var 4 6
# 5 site 1 var 5 3
# 6 site 2 var 1 89
# 7 site 2 var 2 43
# 8 site 2 var 3 12
# 9 site 2 var 4 54
# 10 site 2 var 5 23
# 11 site 3 var 1 76
# 12 site 3 var 2 62
# 13 site 3 var 3 13
# 14 site 3 var 4 43
# 15 site 3 var 5 23
test <- reshape(cdc_city,
varying=list(c("var 1", "var 2", "var 3", "var 4", "var 5")),
idvar= "site", timevar="variable", direction="wide")
# site var 1 var 2 var 3 var 4 var 5
# 1 site 1 4 7 2 6 3
# 6 site 2 89 43 12 54 23
# 11 site 3 76 62 13 43 23
Be careful about data.frames where strings are factors. You can specify varying
from a factor in the data.frame directly using
reshape( ..., varying=list(as.character(unique(cdc_city$variable))), ...)
Upvotes: 3
Reputation: 193537
For reshaping like this, xtabs
is also useful:
xtabs(value ~ site + variable, data = cdc_city)
# variable
# site var 1 var 2 var 3 var 4 var 5
# site 1 4 7 2 6 3
# site 2 89 43 12 54 23
# site 3 76 62 13 43 23
Regarding your question, though, I don't actually see this as a "problem". That's exactly how it is supposed to work.
Consider the following scenario. You have another column in your data.frame
(we'll call it "other") and you want to reshape that whole dataset from long to wide.
Here's some sample data and output:
set.seed(1)
cdc_city$other = sample(1:20, 15, replace=TRUE)
reshape(cdc_city, direction="wide",
idvar="site", timevar="variable")
# site value.var 1 other.var 1 value.var 2 other.var 2 value.var 3
# 1 site 1 4 6 7 8 2
# 6 site 2 89 18 43 19 12
# 11 site 3 76 5 62 4 13
# other.var 3 value.var 4 other.var 4 value.var 5 other.var 5
# 1 12 6 19 3 5
# 6 14 54 13 23 2
# 11 14 43 8 23 16
In my opinion, tacking on the value
and other
to the variable names is essential in this case.
Finally, your new.row.names
argument is useless here because your unhappiness is with the column names, not row names.
Since I'm in the mood for sharing alternatives, you may also be interested in exploring aggregate
, which also has fairly easy to understand syntax:
aggregate(list(var = cdc_city$value),
list(site = cdc_city$site), c)
# site var.1 var.2 var.3 var.4 var.5
# 1 site 1 4 7 2 6 3
# 2 site 2 89 43 12 54 23
# 3 site 3 76 62 13 43 23
If you use aggregate
, you also have some control over the names of the resulting variables.
Upvotes: 4
Reputation: 179448
I suggest you use the reshape2
package instead - it's much easier to use melt
and dcast
than the built-in reshape()
:
library(reshape2)
dcast(cdc_city, site~variable, mean)
site var 1 var 2 var 3 var 4 var 5
1 site 1 4 7 2 6 3
2 site 2 89 43 12 54 23
3 site 3 76 62 13 43 23
Upvotes: 2