user3799924
user3799924

Reputation: 93

Split column name and convert data from wide to long format in R

I have a large dataset that I need to convert to long format from wide format. That should be simple enough and there are lots of examples of how to do that on this forum. However, in this case, I also need to split the column headers used in the wide format and create a column for each of them in the long format.

Example dataset

 data <- data.frame("East2010"=1:3, "West2010"=4:6, "East2011"=7:9, "West2011"=5:7)
 data
 East.2010 West.2010 East.2011 West.2011
 1         1         4         7         5
 2         2         5         8         6
 3         3         6         9         7

What I want is something like this

 Site   Year   Response
 East   2010   1
 East   2010   2
 East   2010   3
 West   2010   4
 West   2010   5
 West   2010   6
 East   2011   7
 East   2011   8
 East   2011   9
 West   2011   5
 West   2011   6
 West   2011   7

I have looked a lot of examples on this forum that will melt data to convert to long format and others that do a column split at a delimiter, but I have not been able to make the two work together.

Upvotes: 8

Views: 2943

Answers (3)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

Here's the "modern day" :-) approach for this:

library(dplyr)
library(tidyr)
data %>%
  gather(var, Response, East2010:West2011) %>%  ## Makes wide data long
  separate(var, c("Site", "Year"), sep = -5)    ## Splits up a column
#    Site Year Response
# 1  East 2010        1
# 2  East 2010        2
# 3  East 2010        3
# 4  West 2010        4
# 5  West 2010        5
# 6  West 2010        6
# 7  East 2011        7
# 8  East 2011        8
# 9  East 2011        9
# 10 West 2011        5
# 11 West 2011        6
# 12 West 2011        7

The sep = -5 from above says to go from the end of the string backwards five characters and split there. Thus, if you had "North2010" as a possible name, this would still work.

That said, it is more reliable to use a regular expression like @David's, which is also possible within separate:

data %>%
  gather(var, Response, East2010:West2011) %>%
  separate(var, c("Site", "Year"), 
           sep = "(?<=[[:alpha:]])(?=[[:digit:]])", 
           perl = TRUE)

Upvotes: 5

David Arenburg
David Arenburg

Reputation: 92292

Or (in case the column width is not always constant). Here I use a "lookahead" and "lookbehind" in order to separate characters from digits.

library(reshape2)
data <- melt(data)
temp <- strsplit(as.character(data$variable), "(?<=[[:alpha:]])(?=[[:digit:]])", perl = TRUE)
transform(data, Site = sapply(temp, "[", 1), Year = sapply(temp, "[", 2))

#   variable value Site Year
#1  East2010     1 East 2010
#2  East2010     2 East 2010
#3  East2010     3 East 2010
#4  West2010     4 West 2010
#5  West2010     5 West 2010
#6  West2010     6 West 2010
#7  East2011     7 East 2011
#8  East2011     8 East 2011
#9  East2011     9 East 2011
#10 West2011     5 West 2011
#11 West2011     6 West 2011
#12 West2011     7 West 2011

Upvotes: 3

landroni
landroni

Reputation: 2988

Something along these lines would work:

library("plyr")
library("reshape2")
m.data <- melt(data)
m.data <- mutate(m.data, Site=substr(variable, 1,4), 
    Year=substr(variable, 5,8))

Which would result in:

> m.data
   variable value Site Year
1  East2010     1 East 2010
2  East2010     2 East 2010
3  East2010     3 East 2010
4  West2010     4 West 2010
5  West2010     5 West 2010
6  West2010     6 West 2010
7  East2011     7 East 2011
8  East2011     8 East 2011
9  East2011     9 East 2011
10 West2011     5 West 2011
11 West2011     6 West 2011
12 West2011     7 West 2011

Upvotes: 2

Related Questions