Reputation: 439
I'm wondering if there is an easy way to restructure some data I have. I currently have a data frame that looks like this...
Year Cat Number
2001 A 15
2001 B 2
2002 A 4
2002 B 12
But what I ultimately want is to have it in this shape...
Year Cat Number Cat Number
2001 A 15 B 2
2002 A 4 B 12
Is there a simple way to do this?
Thanks in advance
:)
Upvotes: 2
Views: 1221
Reputation: 886938
One way would be to use dcast/melt
from reshape2
. In the below code, first I created a sequence of numbers (indx
column) for each Year
by using transform
and ave
. Then, melt
the transformed dataset keeping id.var
as Year
, and indx
. The long
format dataset is then reshaped to wide
format using dcast
. If you don't need the suffix _number
, you can use gsub
to remove that part.
library(reshape2)
res <- dcast(melt(transform(df, indx=ave(seq_along(Year), Year, FUN=seq_along)),
id.var=c("Year", "indx")), Year~variable+indx, value.var="value")
colnames(res) <- gsub("\\_.*", "", colnames(res))
res
# Year Cat Cat Number Number
#1 2001 A B 15 2
#2 2002 A B 4 12
Or using dplyr/tidyr
. Here, the idea is similar as above. After grouping by Year
column, generate a indx
column using mutate
, then reshape to long format with gather
, unite
two columns to a single column VarIndx
and then reshape back to wide format with spread
. In the last step mutate_each
, columns with names that start with Number
are converted to numeric
column.
library(dplyr)
library(tidyr)
res1 <- df %>%
group_by(Year) %>%
mutate(indx=row_number()) %>%
gather("Var", "Val", Cat:Number) %>%
unite(VarIndx, Var, indx) %>%
spread(VarIndx, Val) %>%
mutate_each(funs(as.numeric), starts_with("Number"))
res1
# Source: local data frame [2 x 5]
# Year Cat_1 Cat_2 Number_1 Number_2
#1 2001 A B 15 2
#2 2002 A B 4 12
Or you can create an indx
variable .id
using getanID
from splitstackshape
(from comments made by @Ananda Mahto (author of splitstackshape) and use reshape
from base R
library(splitstackshape)
reshape(getanID(df, "Year"), direction="wide", idvar="Year", timevar=".id")
# Year Cat.1 Number.1 Cat.2 Number.2
#1: 2001 A 15 B 2
#2: 2002 A 4 B 12
df <- structure(list(Year = c(2001L, 2001L, 2002L, 2002L), Cat = c("A",
"B", "A", "B"), Number = c(15L, 2L, 4L, 12L)), .Names = c("Year",
"Cat", "Number"), class = "data.frame", row.names = c(NA, -4L
))
Upvotes: 2