Pietro Fabbro
Pietro Fabbro

Reputation: 67

Reshaping wide dataframe to long format

I have a df in the following format:

name other_info revenues_2015 ebitda_2015 ebitda_2016 revenues_2015 other_2017
A Info1 1 2 3 4 5
B Info2 6 7 8 9 10
C Info3 11 12 13 14 15

I would like to change it to long format where I have it structured in the following way:

Name | Info | Year | Metric name | Value

Can you show me how to do that in R? Since the real dataframe has more than 300 columns, is there a way to automate the creation of the year column?


Data:


structure(list(name = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), other_info = structure(1:3, .Label = c("Info1", 
"Info2", "Info3"), class = "factor"), revenues_2015 = structure(c(1L, 
3L, 2L), .Label = c("1", "11", "6"), class = "factor"), ebitda_2015 = structure(c(2L, 
3L, 1L), .Label = c("12", "2", "7"), class = "factor"), ebitda_2016 = structure(c(2L, 
3L, 1L), .Label = c("13", "3", "8"), class = "factor"), revenues_2015 = structure(c(2L, 
3L, 1L), .Label = c("14", "4", "9"), class = "factor"), other_2017 = structure(c(3L, 
1L, 2L), .Label = c("10", "15", "5"), class = "factor")), class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 1

Views: 63

Answers (3)

Mr. Caribbean
Mr. Caribbean

Reputation: 124

You have two option, you can use the utils package (base-r functions, you do not have to call it using library()) or the melt function from reshape2 package.

With the function reshape():

 data = structure(list(name = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), other_info = structure(1:3, .Label = c("Info1", 
"Info2", "Info3"), class = "factor"), revenues_2015 = structure(c(1L, 
3L, 2L), .Label = c("1", "11", "6"), class = "factor"), ebitda_2015 = structure(c(2L, 
3L, 1L), .Label = c("12", "2", "7"), class = "factor"), ebitda_2016 = structure(c(2L, 
3L, 1L), .Label = c("13", "3", "8"), class = "factor"), revenues_2015 = structure(c(2L, 
3L, 1L), .Label = c("14", "4", "9"), class = "factor"), other_2017 = structure(c(3L, 
1L, 2L), .Label = c("10", "15", "5"), class = "factor")), class = "data.frame", row.names = c(NA, 
-3L))

LF_data = reshape(data=data, idvar = c("name","other_info"), varying =c("revenues_2015","ebitda_2015","ebitda_2016","revenues_2015","other_2017"), 
    v.names = c("Value"),times=c("revenues_2015","ebitda_2015","ebitda_2016","revenues_2015","other_2017"), direction = "long")

Using the package reshape2 melt() function:

  1. First you will need to have a dataframe with the attribute stringsAsFactor = False
       data=data.frame(structure(list(name = structure(1:3, .Label = c("A", "B", "C"
        ), class = "factor"), other_info = structure(1:3, .Label = c("Info1", 
        "Info2", "Info3"), class = "factor"), revenues_2015 = structure(c(1L, 
        3L, 2L), .Label = c("1", "11", "6"), class = "factor"), ebitda_2015 = structure(c(2L, 
        3L, 1L), .Label = c("12", "2", "7"), class = "factor"), ebitda_2016 = structure(c(2L, 
        3L, 1L), .Label = c("13", "3", "8"), class = "factor"), revenues_2015 = structure(c(2L, 
        3L, 1L), .Label = c("14", "4", "9"), class = "factor"), other_2017 = structure(c(3L, 
        1L, 2L), .Label = c("10", "15", "5"), class = "factor")), class = "data.frame", row.names = c(NA, 
        -3L)),stringsAsFactors=False)

 2. Then:
LF_data=reshape2::melt(data,id.vars=c("name","other_info"), mesure.vars=c("revenues_2015","ebitda_2015","ebitda_2016","revenues_2015","other_2017"))

melt wont let you have a combination of "name","other_info" and "variable" unless they are unique. In your example it would change the second triplets of revenues_2015 to revenues_2015.1

Upvotes: 1

TarJae
TarJae

Reputation: 78907

A little too late: Similar to the-mad-statter solution. Slightly different using mutate:

library(tidyr)
library(dplyr)

df <- structure(list(name = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), other_info = structure(1:3, .Label = c("Info1", 
"Info2", "Info3"), class = "factor"), revenues_2015 = structure(c(1L, 
3L, 2L), .Label = c("1", "11", "6"), class = "factor"), ebitda_2015 = structure(c(2L, 
3L, 1L), .Label = c("12", "2", "7"), class = "factor"), ebitda_2016 = structure(c(2L, 
3L, 1L), .Label = c("13", "3", "8"), class = "factor"), revenues_2015 = structure(c(2L, 
3L, 1L), .Label = c("14", "4", "9"), class = "factor"), other_2017 = structure(c(3L, 
1L, 2L), .Label = c("10", "15", "5"), class = "factor")), class = "data.frame", row.names = c(NA, -3L)) %>% 
  pivot_longer(revenues_2015:other_2017, names_to = c("Metric name", "Year"),
               names_sep ="_", values_to = "Value") %>% 
  dplyr::mutate(Year = stringr::str_remove(Year, "\\D")) %>% 
  rename(Name=name, Info = other_info)

enter image description here

Upvotes: 0

the-mad-statter
the-mad-statter

Reputation: 8676

Does this work for you?

library(dplyr)
library(tidyr)

structure(list(name = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), other_info = structure(1:3, .Label = c("Info1", 
"Info2", "Info3"), class = "factor"), revenues_2015 = structure(c(1L, 
3L, 2L), .Label = c("1", "11", "6"), class = "factor"), ebitda_2015 = structure(c(2L, 
3L, 1L), .Label = c("12", "2", "7"), class = "factor"), ebitda_2016 = structure(c(2L, 
3L, 1L), .Label = c("13", "3", "8"), class = "factor"), revenues_2015 = structure(c(2L, 
3L, 1L), .Label = c("14", "4", "9"), class = "factor"), other_2017 = structure(c(3L, 
1L, 2L), .Label = c("10", "15", "5"), class = "factor")), class = "data.frame", row.names = c(NA, 
-3L)) %>% 
  pivot_longer(revenues_2015:other_2017, names_pattern = "(.+)_(\\d{4})", names_to = c("metric", "year"))

Upvotes: 1

Related Questions