nils
nils

Reputation: 23

reshape wide into long while splitting

I am looking for reshaping:

  ID p2012 p2010 p2008 p2006 c2012 c2010 c2008 c2006
1  1   160   162   163   165  37.3  37.3  37.1  37.1
2  2   163   164   164   163   2.6   2.6   2.6   2.6

into:

    ID  year    p   c
1   1   2006    165 37.1
2   1   2008    164 37.1
3   1   2010    162 37.3
4   1   2012    160 37.3
5   2   2006    163 2.6
6   2   2008    163 2.6
7   2   2010    164 2.6
8   2   2012    163 2.6

I am new to R, have been trying around with melt and dcast functions, but there are just to many twists for me at this stage. Help would be much appreciated!

A dput of my df:

structure(list(ID = 1:2, p2012 = c(160L, 163L), p2010 = c(162L, 164L), p2008 = 163:164, p2006 = c(165L, 163L), c2012 = c(37.3, 2.6), c2010 = c(37.3, 2.6), c2008 = c(37.1, 2.6), c2006 = c(37.1, 2.6)), .Names = c("ID", "p2012", "p2010", "p2008", "p2006", "c2012", "c2010", "c2008", "c2006"), class = "data.frame", row.names = c(NA, -2L))

Upvotes: 1

Views: 111

Answers (3)

AndrewMacDonald
AndrewMacDonald

Reputation: 2950

You can also use the following solution from tidyr. You don't actually need to use regular expressions, if "p" or "c" is always the first letter of the column names:

library(tidyr)
library(dplyr) # only loaded for the %>% operator

dat %>%
  gather(key,value,p2012:c2006) %>%
  separate(key,c("category","year"),1) %>%
  spread(category,value)
  ID year    c   p
1  1 2006 37.1 165
2  1 2008 37.1 163
3  1 2010 37.3 162
4  1 2012 37.3 160
5  2 2006  2.6 163
6  2 2008  2.6 164
7  2 2010  2.6 164
8  2 2012  2.6 163

Upvotes: 2

Thilo
Thilo

Reputation: 9157

An alternative to shadow's answer is to use the reshape function:

reshape(d, direction='long', varying=list(2:5, 6:9), v.names=c("p", "c"), idvar="ID", times=c(2012, 2010, 2008, 2006))

This assumes that you know the column indices of the p and c beforehand (or add additional code to figure them out). Furthermore, the times vector above could be found by using something similar to the gsub function of shadow.

Which way to use probably is a matter of taste.

Upvotes: 4

shadow
shadow

Reputation: 22293

You probably have to melt the data first, then split the variable and the year and then dcast to your final data.frame.

require(reshape2)
# melt data.frame
dfmelt <- melt(df, id.vars="ID", variable.name="var.year")
# split "var.year" into new variables "var" and "year" 
dfmelt[, "var"] <- gsub("[0-9]", "", as.character(dfmelt[, "var.year"]))
dfmelt[, "year"] <- as.numeric(gsub("[a-z, A-Z]", "", as.character(dfmelt[, "var.year"])))
# cast to data with column for each var-name
dcast(dfmelt, ID+year~var, value.var="value")

Upvotes: 2

Related Questions