Mohan Chen
Mohan Chen

Reputation: 11

How to convert a complicated wide table into long table

I'm dealing with a complicated wide table looks like this:

date      hour   type  A  B  C  D  E
20100202    0     x    1  2  3  4  5
20100202    0     y    6  7  8  9  10
20100202    1     x   11 12 13 14  15
20100202    1     y   16 17 18 19  20

The format I'm looking for is:

date      hour   object  x   y
20100202    0     A      1   6
20100202    0     B      2   7
20100202    0     C      3   8
20100202    0     D      4   9
20100202    0     E      5  10
20100202    1     A     11  16
20100202    1     B     12  17
20100202    1     C     13  18
20100202    1     D     14  19
20100202    1     E     15  20

I've been looking the similar questions that have been proposed before, but I didn't find much useful solution (although I'm pretty sure that's because I'm pretty new to this language)

Any suggestions or help would be appreciated!

Upvotes: 1

Views: 104

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388907

A tidyverse way would using pivot_longer and pivot_wider :

library(tidyr)

df %>%
  pivot_longer(cols = A:E) %>%
  pivot_wider(names_from = type, values_from = value)


#      date  hour name      x     y
#      <int> <int> <chr> <int> <int>
# 1 20100202     0 A         1     6
# 2 20100202     0 B         2     7
# 3 20100202     0 C         3     8
# 4 20100202     0 D         4     9
# 5 20100202     0 E         5    10
# 6 20100202     1 A        11    16
# 7 20100202     1 B        12    17
# 8 20100202     1 C        13    18
# 9 20100202     1 D        14    19
#10 20100202     1 E        15    20

data

df <- structure(list(date = c(20100202L, 20100202L, 20100202L, 20100202L
), hour = c(0L, 0L, 1L, 1L), type = c("x", "y", "x", "y"), A = c(1L, 
6L, 11L, 16L), B = c(2L, 7L, 12L, 17L), C = c(3L, 8L, 13L, 18L
), D = c(4L, 9L, 14L, 19L), E = c(5L, 10L, 15L, 20L)), 
class = "data.frame", row.names = c(NA, -4L))

Upvotes: 1

MichaelChirico
MichaelChirico

Reputation: 34703

In data.table, this is done with melt and dcast:

library(data.table)
DT = fread("date      hour   type  A  B  C  D  E
20100202    0     x    1  2  3  4  5
20100202    0     y    6  7  8  9  10
20100202    1     x   11 12 13 14  15
20100202    1     y   16 17 18 19  20")

Two steps:

DT[ , melt(.SD, id.vars = c('date', 'hour', 'type'), 
           measure.vars = LETTERS[1:5], variable.name='object')
   ][ , dcast(.SD, date + hour + object ~ type, value.var = 'value')]
#         date hour object  x  y
#  1: 20100202    0      A  1  6
#  2: 20100202    0      B  2  7
#  3: 20100202    0      C  3  8
#  4: 20100202    0      D  4  9
#  5: 20100202    0      E  5 10
#  6: 20100202    1      A 11 16
#  7: 20100202    1      B 12 17
#  8: 20100202    1      C 13 18
#  9: 20100202    1      D 14 19
# 10: 20100202    1      E 15 20
  • melt reshapes your data long. id.vars says the variables to "stretch" and measure.vars says which variables will be "transposed" into a single column.
    • About .SD, see here or the corresponding vignette. I use it here to facilitate chaining. See below for an equivalent version without .SD.
    • variable.name is used to get object in the output as you specified. By default it would be named variable instead.
  • dcast is used to reshape wide again, in the different arrangement requested. the LHS of ~ gives the variables that will remain columns, the RHS says which column's values will become columns (here the values of type, x and y, become columns. The formula gives the shape of the output table; value.var says which variable will "fill out" the shape implied by the formula.

Upvotes: 4

Related Questions