Lucien S.
Lucien S.

Reputation: 5345

How to reshape a csv table in R?

I have this dataset:

   Group Group Group Cat Cat Cat  Betw
1      a               A          5.87
2      b           j   A          0.11
3      c               B       A  2.18
4      d               C   D      5.31
5      e               E   C      0.00
6      f               E         352.10
7      g               E          0.35
8      h               A   B      0.00
9      i     m         F          0.00
10     j               A   D      15.04

Which I would like to reshape so that there is only 3 columns : Var1 (which would be 'Group' or 'Cat'), Var2 (which would have a lower case or uppercase letter) and Betw.

So for instance, c, b and A all have a value of 2.1892749,

   Var1 Var2  Betw
1 Group    a  5.87
2   Cat    A  5.87
3 Group    b  0.11
4 Group    j  0.11
5   Cat    A  0.11
...

How to do this with R?

Upvotes: 1

Views: 129

Answers (3)

jMathew
jMathew

Reputation: 1057

I guess directly applying melt doesn't work for you because of the duplicated column names in the data frame. So along the lines of @akrun, you can use something like this

tmp <- data.frame(df, check.names=T)
tmp <- melt(tmp, id="Betw", variable.name="Var1", value.name="Var2")
tmp$Var1 <- gsub("(.*)\\.[0-9]", "\\1", tmp$Var1)
df <- subset(tmp, Var2!="")

The data frame I used,

df <- data.frame(Group=c("a","b","c","d","e","f","g","h","i","j"),
                 Group=c("","","","","","","","","m",""),
                 Group=c("","j","","","","","","","",""),
                 Cat=c("A","A","B","C","E","E","E","A","F","A"),
                 Cat=c("","","","D","C","","","B","","D"),
                 Cat=c("","","A","","","","","","",""),
                 Betw=c(5.87,0.11,2.18,5.31,0,352.1,0.35,0,0,15.04),
                 check.names = F)

Upvotes: 1

akrun
akrun

Reputation: 887901

We could also use data.table. We convert the 'data.frame' to 'data.table' (setDT(dat), reshape to long format with melt, delete the rows that are blank for 'Var2', and remove the substring in 'Var1' that starts with . to the end of the string (if present).

library(data.table)#v1.9.6+
melt(setDT(dat), id.var='Betw', variable.name='Var1', 
        value.name='Var2')[Var2!=''][, Var1:= sub('\\..*', '', Var1)][]

Upvotes: 2

jeremycg
jeremycg

Reputation: 24945

You can use dplyr and tidyr. First we gather to long data, then remove the extra numbers put onto the columns, then we remove the blanks:

library(dplyr)
library(tidyr)
dat %>% gather(Var1, Var2, -Betw) %>%
        mutate(Var1 = gsub(".[0-9]$", "", Var1)) %>%
        filter(Var2 != "") 

data used:

structure(list(Group = structure(1:10, .Label = c("a", "b", "c", 
"d", "e", "f", "g", "h", "i", "j"), class = "factor"), Group.1 = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L), .Label = c("", "m"), class = "factor"), 
    Group.2 = structure(c(1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = c("", "j"), class = "factor"), Cat = structure(c(1L, 
    1L, 2L, 3L, 4L, 4L, 4L, 1L, 5L, 1L), .Label = c("A", "B", 
    "C", "E", "F"), class = "factor"), Cat.1 = structure(c(1L, 
    1L, 1L, 4L, 3L, 1L, 1L, 2L, 1L, 4L), .Label = c("", "B", 
    "C", "D"), class = "factor"), Cat.2 = structure(c(1L, 1L, 
    2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "A"), class = "factor"), 
    Betw = c(5.87, 0.11, 2.18, 5.31, 0, 352.1, 0.35, 0, 0, 15.04
    )), .Names = c("Group", "Group.1", "Group.2", "Cat", "Cat.1", 
"Cat.2", "Betw"), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10"))

Upvotes: 0

Related Questions