Andre Elrico
Andre Elrico

Reputation: 11480

reduce number of rows by cbinding data, also rename columns

I want to "partially" make the data.frame "wider" Not really succeeding in doing so.

data looks like

 ID X1 X2 X3 X4
 1l  1  5  9 13
 1r  2  6 10 14
 2r  3  7 11 15
 2l  4  8 12 16

desired output

 ID X1_l X2_l X3_l X4_l X1_r X2_r X3_r X4_r
  1    1    5    9   13    2    6   10   14
  2    4    8   12   16    3    7   11   15

rep data

df <- data.frame(ID=c("1l","1r","2r","2l"),matrix(1:16,ncol=4))

please note:

  1. the order of l and r are not always the same. As shown in the example. So no cheating possible here. We could do a natural sort to solve that but let's don't do it. Possibly there is a single r or l number. So for e.g., there exists 18l but NOT 18r.

My try is to first split the data up by same numbers: Like so

split(x = df,f = sub("[lr]","",df$ID))

Any fast solutions for a problem like that?

Edit

@Arkun

Error Message:

    Error: Duplicate identifiers for rows (7561, 7562), (7581, 7582), (7599, 7600), (7619, 7620), (7563, 7564), (7583, 7584), (7601, 7602), (7621, 7622), (7565, 7566), (7585, 7586), (7603, 7604), (7623, 7624), (7567, 7568), (7587, 7588), (7605, 7606), (7625, 7626), (7569, 7570), (7607, 7608), (7571, 7572), (7589, 7590), (7609, 7610), (7573, 7574), (7591, 7592), (7611, 7612), (7575, 7576), (7593, 7594), (7613, 7614), (7577, 7578), (7595, 7596), (7615, 7616), (7579, 7580), (7597, 7598), (7617, 7618), (6577, 6578), (6597, 6598), (6615, 6616), (6635, 6636), (6579, 6580), (6599, 6600), (6617, 6618), (6637, 6638), (6581, 6582), (6601, 6602), (6619, 6620), (6639, 6640), (6583, 6584), (6603, 6604), (6621, 6622), (6641, 6642), (6585, 6586), (6623, 6624), (6587, 6588), (6605, 6606), (6625, 6626), (6589, 6590), (6607, 6608), (6627, 6628), (6591, 6592), (6609, 6610), (6629, 6630), (6593, 6594), (6611, 6612), (6631, 6632), (6595, 6596), (6613, 6614), (6633, 6634), (7643, 7644), (7663, 7664), (7681, 76
In addition: Warning message:
Too many values at 66 locations: 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, ... 

About the real data:

43 patients (1r,1l, ... 43r,43l) About 112 Variables. First after ID is called v2_1_prot and last v2_80_CK

So i replaced the part of x1:x4 with v2_1_prot:v2_80_CK. Am I doing something wrong here?

I checked my ID variable there are not duplicates so far.

all_pat_rbind2$ID %>% count %>% extract2("freq") %>% max # is equal to one

Upvotes: 1

Views: 89

Answers (2)

akrun
akrun

Reputation: 887213

We can use methods from tidyverse

library(dplyr)
library(tidyr)
df %>%
   separate(ID, into = c("ID", "v1"), sep="(?<=\\d)(?![0-9])") %>% 
   gather(key, val,  -one_of("ID", "v1"))  %>% 
   unite(key_v1, key, v1) %>% 
   spread(key_v1, val)
#  ID X1_l X1_r X2_l X2_r X3_l X3_r X4_l X4_r
#1  1    1    2    5    6    9   10   13   14
#2  2    4    3    8    7   12   11   16   15

Upvotes: 3

d.b
d.b

Reputation: 32548

temp = gsub("\\D+", "", df$ID)
data.frame(ID = unique(temp), do.call(cbind, lapply(X = c("l", "r"),
    FUN = function(f) setNames(object = data.frame(sapply(X = names(df)[-1],
        FUN = function(cx) sapply(X = unique(temp),
            FUN = function(rx) df[df$ID == paste0(rx,f),cx]))),
        nm = paste(names(df)[-1], f, sep = "_")))))
#  ID X1_l X2_l X3_l X4_l X1_r X2_r X3_r X4_r
#1  1    1    5    9   13    2    6   10   14
#2  2    4    8   12   16    3    7   11   15

Upvotes: 3

Related Questions