Reputation: 41
I think I am so close to the solution, I just need a few pointers in the right direction and I would very much appreciate straightening this out!
So I have a dataset that I want to turn into only two columns. Here is a similar dummy dataset.
1 1.1 1.2 1.3 1.4
2 2.1 2.2 2.3 2.4
3 3.1 3.2 3.3 3.4
4 4.1 4.2 4.3 4.4
It is a csv, so when I import it R attaches its own headers, like so:
V1 V2 V3 V4 V5
1 1.1 1.2 1.3 1.4
2 2.1 2.2 2.3 2.4
3 3.1 3.2 3.3 3.4
4 4.1 4.2 4.3 4.4
And I want it to look like this:
id value
1 1.1
1 1.2
1 1.3
1 1.4
2 2.1
2 2.2
...
4 4.4
The problem is, this is an ongoing project and there will be more columns to the dataset (V6, V7, etc) so I can't hard-code anything. I have a list saved of all the heading names, and this appears to work.
data <- read.csv(file="location", header = FALSE)
dates = ncol(data)
list = 2:dates
variables <-paste0('V',list)
so now variables is a list of the column names for all of the columns that I want to condense into one.
My code for reshaping is this:
newdata <- reshape(data, idvar = "V1", direction = "long", varying = variables, sep="")
but it gives me an unexpected data frame as a result. I get:
row.names V1 time V
1.2 1 2 1.1
2.2 2 2 2.1
3.2 3 2 3.1
4.2 4 2 4.1
1.3 1 3 1.2
...
The V1 and V columns are correct - it's what I want and it's matched up correctly, if not sorted. However, where did row.names and time come from? I can delete time but not row.names because when I try to access the column row.names with newdata[,1] it gives me the V1 column, and newdata["row.names"] says "undefined columns selected."
So if someone could tell me what I am doing wrong or how to reformat my reshape statement so these weird columns do not appear I would be very grateful. Thank you!
Upvotes: 4
Views: 2884
Reputation:
If this is a data frame, you can do it through reshape 2 easily with melt()
. Something like:
newdata <- melt(data, measure.vars = 1:ncol(data))
This'll give you a data frame of "Variable" (containing each column name) and "value" (containing the values attached to each column name).
If you want to orient the dataframe around V1, you probably want melt(data, id.vars = 1, measure.vars = 2:ncol(data))
instead.
Upvotes: 0
Reputation: 193547
First, the behavior you describe is from using View
or fix
, where if there are row.names
that are not just a sequence of the number of rows in the dataset, in the spreadsheet view, it shows as another column of data named "row.names".
Here's a small example:
## Sample data
df1 <- df2 <- data.frame(matrix(1:4, ncol = 2,
dimnames = list(c("A", "B"), c("a", "b"))))
rownames(df2) <- NULL
fix(df1) # R's spreadsheet view
View(df1) # RStudio data viewer
fix(df2)
View(df2)
reshape
to work as you expectSecond, the reshape
function in base R has a new.row.names
argument. Unfortunately, you can't simply set that to NULL
. You need to set it to a sequential vector if you want to get rid of the strange row.names
that are created by default. To do that, you need to know how long your final data will be (the product of the number of varying columns by the number of rows in the original dataset). As such, you can do something like:
id <- "V1"
varCols <- setdiff(names(mydf), "V1")
out <- reshape(mydf, direction = "long", idvar=id, varying=varCols, sep = "",
new.row.names=sequence(prod(length(varCols), nrow(mydf))))
This still leaves the time
variable, so you need to manually remove that like:
out$time <- NULL
out
# V1 V
# 1 1 1.1
# 2 2 2.1
# 3 3 3.1
# 4 4 4.1
# <:::SNIP:::>
# 12 4 4.3
# 13 1 1.4
# 14 2 2.4
# 15 3 3.4
# 16 4 4.4
Alternatively, you can do what you did, then set row.names(out) <- NULL
instead of using the new.row.names
argument from reshape
.
In base R, another approach is to use stack
and just drop the "ind" column (which is the first column in the stacked data). Then, just bind that back in with the "id" column.
cbind(mydf[1], stack(mydf[-1])[1])
# V1 values
# 1 1 1.1
# 2 2 2.1
# 3 3 3.1
# 4 4 4.1
# <:::SNIP:::>
# 12 4 4.3
# 13 1 1.4
# 14 2 2.4
# 15 3 3.4
# 16 4 4.4
Alternatively, as has already been mentioned in comments, use melt
from "reshape2":
install.packages("reshape2") ## if it is not yet installed
library(reshape2)
out2 <- melt(mydf, id.vars="V1")
out2$variable <- NULL
out2
# V1 value
# 1 1 1.1
# 2 2 2.1
# 3 3 3.1
# 4 4 4.1
# <:::SNIP:::>
# 12 4 4.3
# 13 1 1.4
# 14 2 2.4
# 15 3 3.4
# 16 4 4.4
Upvotes: 4