Reputation: 5977
I'm having trouble rearranging the following data frame:
set.seed(45)
dat1 <- data.frame(
name = rep(c("firstName", "secondName"), each=4),
numbers = rep(1:4, 2),
value = rnorm(8)
)
dat1
name numbers value
1 firstName 1 0.3407997
2 firstName 2 -0.7033403
3 firstName 3 -0.3795377
4 firstName 4 -0.7460474
5 secondName 1 -0.8981073
6 secondName 2 -0.3347941
7 secondName 3 -0.5013782
8 secondName 4 -0.1745357
I want to reshape it so that each unique "name" variable is a rowname, with the "values" as observations along that row and the "numbers" as colnames. Sort of like this:
name 1 2 3 4
1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
I've looked at melt
and cast
and a few other things, but none seem to do the job.
Upvotes: 385
Views: 516193
Reputation: 388797
I have recently started looking into collapse
package which is super fast and useful. In collapse
we can use the pivot
function to do this transformation.
library(collapse)
pivot(dat1, "name", "value", "numbers", how = "wider")
# name 1 2 3 4
#1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
#2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
Upvotes: 1
Reputation: 886938
With tidyr, there is pivot_wider()
and pivot_longer()
which are generalized to do reshaping from long -> wide or wide -> long, respectively. Using the OP's data:
single column long -> wide
library(tidyr)
dat1 %>%
pivot_wider(names_from = numbers, values_from = value)
# # A tibble: 2 x 5
# name `1` `2` `3` `4`
# <fct> <dbl> <dbl> <dbl> <dbl>
# 1 firstName 0.341 -0.703 -0.380 -0.746
# 2 secondName -0.898 -0.335 -0.501 -0.175
multiple columns long -> wide
pivot_wider()
is also capable of more complex pivot operations. For example, you can pivot multiple columns simultaneously:
# create another column for showing the functionality
dat2 <- dat1 %>%
dplyr::rename(valA = value) %>%
dplyr::mutate(valB = valA * 2)
dat2 %>%
pivot_wider(names_from = numbers, values_from = c(valA, valB))
# # A tibble: 2 × 9
# name valA_1 valA_2 valA_3 valA_4 valB_1 valB_2 valB_3 valB_4
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 firstName 0.341 -0.703 -0.380 -0.746 0.682 -1.41 -0.759 -1.49
# 2 secondName -0.898 -0.335 -0.501 -0.175 -1.80 -0.670 -1.00 -0.349
There is much more functionality to be found in the docs.
Upvotes: 62
Reputation: 151
Came here via a linked question Reshape three column data frame to matrix ("long" to "wide" format). That question is closed, so I writing an alternative solution here.
I found a alternative solution, perhaps useful for someone looking for converting three columns to a matrix. I am referring to decoupleR (2.3.2) package. Below is copied from their site
Generates a kind of table where the rows come from id_cols, the columns from names_from and the values from values_from.
Usage
pivot_wider_profile(
data,
id_cols,
names_from,
values_from,
values_fill = NA,
to_matrix = FALSE,
to_sparse = FALSE,
...
)
Upvotes: 1
Reputation: 8863
This works even if you have missing pairs and it doesn't require sorting (as.matrix(dat1)[,1:2]
can be replaced with cbind(dat1[,1],dat1[,2])
):
> set.seed(45);dat1=data.frame(name=rep(c("firstName","secondName"),each=4),numbers=rep(1:4,2),value=rnorm(8))
> u1=unique(dat1[,1]);u2=unique(dat1[,2])
> m=matrix(nrow=length(u1),ncol=length(u2),dimnames=list(u1,u2))
> m[as.matrix(dat1)[,1:2]]=dat1[,3]
> m
1 2 3 4
firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
This doesn't work if you have missing pairs and it requires sorting, but it's a bit shorter in case the pairs are already sorted:
> u1=unique(dat1[,1]);u2=unique(dat1[,2])
> dat1=dat1[order(dat1[,1],dat1[,2]),] # not actually needed in this case
> matrix(dat1[,3],length(u1),,T,list(u1,u2))
1 2 3 4
firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
Here's a function version of the first approach (add as.data.frame
to make it work with tibbles):
l2w=function(x,row=1,col=2,val=3,sort=F){
u1=unique(x[,row])
u2=unique(x[,col])
if(sort){u1=sort(u1);u2=sort(u2)}
out=matrix(nrow=length(u1),ncol=length(u2),dimnames=list(u1,u2))
out[cbind(x[,row],x[,col])]=x[,val]
out
}
Or if you only have the values of the lower triangle, you can do this:
> euro=as.matrix(eurodist)[1:3,1:3]
> lower=data.frame(V1=rownames(euro)[row(euro)[lower.tri(euro)]],V2=colnames(euro)[col(euro)[lower.tri(euro)]],V3=euro[lower.tri(euro)])
> lower
V1 V2 V3
1 Barcelona Athens 3313
2 Brussels Athens 2963
3 Brussels Barcelona 1318
> n=unique(c(lower[,1],lower[,2]))
> full=rbind(lower,setNames(lower[,c(2,1,3)],names(lower)),data.frame(V1=n,V2=n,V3=0))
> full
V1 V2 V3
1 Barcelona Athens 3313
2 Brussels Athens 2963
3 Brussels Barcelona 1318
4 Athens Barcelona 3313
5 Athens Brussels 2963
6 Barcelona Brussels 1318
7 Athens Athens 0
8 Barcelona Barcelona 0
9 Brussels Brussels 0
> l2w(full,sort=T)
Athens Barcelona Brussels
Athens 0 3313 2963
Barcelona 3313 0 1318
Brussels 2963 1318 0
Or here's another approach:
> rc=as.matrix(lower[-3])
> n=sort(unique(c(rc)))
> m=matrix(0,length(n),length(n),,list(n,n))
> m[rc]=lower[,3]
> m[rc[,2:1]]=lower[,3]
> m
Athens Barcelona Brussels
Athens 0 3313 2963
Barcelona 3313 0 1318
Brussels 2963 1318 0
Another simple method in base R is to use xtabs
. The result of xtabs
is basically just a matrix with a fancy class name, but you can make it look like a regular matrix with class(x)=NULL;attr(x,"call")=NULL;dimnames(x)=unname(dimnames(x))
:
> x=xtabs(value~name+numbers,dat1);x
numbers
name 1 2 3 4
firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
> str(x)
'xtabs' num [1:2, 1:4] 0.341 -0.898 -0.703 -0.335 -0.38 ...
- attr(*, "dimnames")=List of 2
..$ name : chr [1:2] "firstName" "secondName"
..$ numbers: chr [1:4] "1" "2" "3" "4"
- attr(*, "call")= language xtabs(formula = value ~ name + numbers, data = dat1)
> class(x)
[1] "xtabs" "table"
> class(as.matrix(x)) # `as.matrix` has no effect because `x` is already a matrix
[1] "xtabs" "table"
> class(x)=NULL;class(x)
[1] "matrix" "array"
> attr(x,"call")=NULL;dimnames(x)=unname(dimnames(x))
> x # now it looks like a regular matrix
1 2 3 4
firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
> str(x)
num [1:2, 1:4] 0.341 -0.898 -0.703 -0.335 -0.38 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:2] "firstName" "secondName"
..$ : chr [1:4] "1" "2" "3" "4"
Normally as.data.frame(x)
converts the result of xtabs
back to long format, but you can avoid it with class(x)=NULL
:
> x=xtabs(value~name+numbers,dat1);as.data.frame(x)
name numbers Freq
1 firstName 1 0.3407997
2 secondName 1 -0.8981073
3 firstName 2 -0.7033403
4 secondName 2 -0.3347941
5 firstName 3 -0.3795377
6 secondName 3 -0.5013782
7 firstName 4 -0.7460474
8 secondName 4 -0.1745357
> class(x)=NULL;as.data.frame(x)
1 2 3 4
firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
This converts data in wide fromat to long format (unlist
converts a dataframe to a vector and c
converts a matrix to a vector):
w2l=function(x)data.frame(V1=rownames(x)[row(x)],V2=colnames(x)[col(x)],V3=unname(c(unlist(x))))
Upvotes: 1
Reputation: 1996
Using only dplyr
and map
.
library(dplyr)
library(purrr)
set.seed(45)
dat1 <- data.frame(
name = rep(c("firstName", "secondName"), each=4),
numbers = rep(1:4, 2), value = rnorm(8)
)
longer_to_wider <- function(data, name_from, value_from){
group <- colnames(data)[!(colnames(data) %in% c(name_from,value_from))]
data %>% group_by(.data[[group]]) %>%
summarise( name = list(.data[[name_from]]),
value = list(.data[[value_from]])) %>%
{
d <- data.frame(
name = .[[name_from]] %>% unlist() %>% unique()
)
e <- map_dfc(.[[group]],function(x){
y <- data_frame(
x = data %>% filter(.data[[group]] == x) %>% pull(value_from)
)
colnames(y) <- x
y
})
cbind(d,e)
}
}
longer_to_wider(dat1, "name", "value")
# name 1 2 3 4
# 1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
# 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
Upvotes: -1
Reputation: 145745
The new (in 2014) tidyr
package also does this simply, with gather()
/spread()
being the terms for melt
/cast
.
Edit: Now, in 2019, tidyr v 1.0 has launched and set spread
and gather
on a deprecation path, preferring instead pivot_wider
and pivot_longer
, which you can find described in this answer. Read on if you want a brief glimpse into the brief life of spread/gather
.
library(tidyr)
spread(dat1, key = numbers, value = value)
From github,
tidyr
is a reframing ofreshape2
designed to accompany the tidy data framework, and to work hand-in-hand withmagrittr
anddplyr
to build a solid pipeline for data analysis.Just as
reshape2
did less than reshape,tidyr
does less thanreshape2
. It's designed specifically for tidying data, not the general reshaping thatreshape2
does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, andtidyr
provides no margins or aggregation.
Upvotes: 169
Reputation: 181
much easier way!
devtools::install_github("yikeshu0611/onetree") #install onetree package
library(onetree)
widedata=reshape_toWide(data = dat1,id = "name",j = "numbers",value.var.prefix = "value")
widedata
name value1 value2 value3 value4
firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
if you want to go back from wide to long, only change Wide to Long, and no changes in objects.
reshape_toLong(data = widedata,id = "name",j = "numbers",value.var.prefix = "value")
name numbers value
firstName 1 0.3407997
secondName 1 -0.8981073
firstName 2 -0.7033403
secondName 2 -0.3347941
firstName 3 -0.3795377
secondName 3 -0.5013782
firstName 4 -0.7460474
secondName 4 -0.1745357
Upvotes: 7
Reputation: 6363
The base reshape
function works perfectly fine:
df <- data.frame(
year = c(rep(2000, 12), rep(2001, 12)),
month = rep(1:12, 2),
values = rnorm(24)
)
df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")
df_wide
Where
idvar
is the column of classes that separates rowstimevar
is the column of classes to cast widev.names
is the column containing numeric valuesdirection
specifies wide or long formatsep
argument is the separator used in between timevar
class names and v.names
in the output data.frame
. If no idvar
exists, create one before using the reshape()
function:
df$id <- c(rep("year1", 12), rep("year2", 12))
df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")
df_wide
Just remember that idvar
is required! The timevar
and v.names
part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.
Upvotes: 18
Reputation: 388797
Using base R aggregate
function:
aggregate(value ~ name, dat1, I)
# name value.1 value.2 value.3 value.4
#1 firstName 0.4145 -0.4747 0.0659 -0.5024
#2 secondName -0.8259 0.1669 -0.8962 0.1681
Upvotes: 20
Reputation: 3045
There's very powerful new package from genius data scientists at Win-Vector (folks that made vtreat
, seplyr
and replyr
) called cdata
. It implements "coordinated data" principles described in this document and also in this blog post. The idea is that regardless how you organize your data, it should be possible to identify individual data points using a system of "data coordinates". Here's a excerpt from the recent blog post by John Mount:
The whole system is based on two primitives or operators cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These operators have pivot, un-pivot, one-hot encode, transpose, moving multiple rows and columns, and many other transforms as simple special cases.
It is easy to write many different operations in terms of the cdata primitives. These operators can work-in memory or at big data scale (with databases and Apache Spark; for big data use the cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN() variants). The transforms are controlled by a control table that itself is a diagram of (or picture of) the transform.
We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.
library(cdata)
# first build the control table
pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
columnToTakeKeysFrom = 'numbers', # this will become column headers
columnToTakeValuesFrom = 'value', # this contains data
sep="_") # optional for making column names
# perform the move of data to columns
dat_wide <- moveValuesToColumnsD(tallTable = dat1, # reference to dataset
keyColumns = c('name'), # this(these) column(s) should stay untouched
controlTable = pivotControlTable# control table above
)
dat_wide
#> name numbers_1 numbers_2 numbers_3 numbers_4
#> 1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
#> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
Upvotes: 12
Reputation:
Using your example dataframe, we could:
xtabs(value ~ name + numbers, data = dat1)
Upvotes: 31
Reputation: 26248
Another option if performance is a concern is to use data.table
's extension of reshape2
's melt & dcast functions
(Reference: Efficient reshaping using data.tables)
library(data.table)
setDT(dat1)
dcast(dat1, name ~ numbers, value.var = "value")
# name 1 2 3 4
# 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078
# 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814
And, as of data.table v1.9.6 we can cast on multiple columns
## add an extra column
dat1[, value2 := value * 2]
## cast multiple value columns
dcast(dat1, name ~ numbers, value.var = c("value", "value2"))
# name value_1 value_2 value_3 value_4 value2_1 value2_2 value2_3 value2_4
# 1: firstName 0.1836433 -0.8356286 1.5952808 0.3295078 0.3672866 -1.6712572 3.190562 0.6590155
# 2: secondName -0.8204684 0.4874291 0.7383247 0.5757814 -1.6409368 0.9748581 1.476649 1.1515627
Upvotes: 65
Reputation: 13570
Other two options:
Base package:
df <- unstack(dat1, form = value ~ numbers)
rownames(df) <- unique(dat1$name)
df
sqldf
package:
library(sqldf)
sqldf('SELECT name,
MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1,
MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,
MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,
MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4
FROM dat1
GROUP BY name')
Upvotes: 28
Reputation: 10437
You can do this with the reshape()
function, or with the melt()
/ cast()
functions in the reshape package. For the second option, example code is
library(reshape)
cast(dat1, name ~ numbers)
Or using reshape2
library(reshape2)
dcast(dat1, name ~ numbers)
Upvotes: 87
Reputation: 69151
Using reshape
function:
reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")
Upvotes: 374