Minh Phúc Huỳnh
Minh Phúc Huỳnh

Reputation: 31

String split and duplicate row in R?

I have a database as follow:

col1 <-c("X1",";X2","X3;X4;X5","X6;X7")
[1] "X1"       ";X2"      "X3;X4;X5" "X6;X7" 
col2 = col3 = col4 <- c("5","8","4","1")
[1] "5" "8" "1" "4"
dat <- data.frame(col1,col2,col3,col4)
 dat
      col1 col2 col3 col4
1       X1    5    5    5
2      ;X2    8    8    8
3 X3;X4;X5    4    4    4
4    X6;X7    1    1    1

I want to remove semi colon in column 1 and if column 1 which contains over one X, it will duplicate new rows. Moreover, column 2 and 3 will duplicate values, column 4 will replace by value 0. I expect it will be:

dat
  col1 col2 col3 col4
1   X1    5    5    5
2   X2    8    8    8
3   X3    4    4    4
4   X4    4    4    0
5   X5    4    4    0
6   X6    1    1    1
7   X7    1    1    0

Upvotes: 2

Views: 1976

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269526

This splits col1 by semicolon and uses tidyr's unnest to explode single rows into multiple rows as needed. Next it uses filter to remove unwanted rows that were generated by leading (or trailing or multiple) semicolons. tidyr puts col1 last so we rearrange using select. Alternately, if the column order does not matter just omit the select.

library(dplyr)
library(tidyr)

   dat %>%
       mutate(col1 = strsplit(as.character(col1), ";")) %>%
       unnest(col1) %>%
       filter(col1 != "") %>%
       select(col1, col2:col4)

giving:

Source: local data frame [7 x 4]

       col1   col2   col3   col4
      (chr) (fctr) (fctr) (fctr)
    1    X1      5      5      5
    2    X2      8      8      8
    3    X3      4      4      4
    4    X4      4      4      4
    5    X5      4      4      4
    6    X6      1      1      1
    7    X7      1      1      1

Upvotes: 6

akrun
akrun

Reputation: 887078

We can use cSplit

library(splitstackshape)
cSplit(dat, "col1", ";", "long")[col1!=""][duplicated(col4), col4 := as.numeric(0)][]
#    col1 col2 col3 col4
#1:   X1    5    5    5
#2:   X2    8    8    8
#3:   X3    4    4    4
#4:   X4    4    4    0
#5:   X5    4    4    0
#6:   X6    1    1    1
#7:   X7    1    1    0

data

col2 = col3 = col4 <- c(5,8,4,1)
 dat <- data.frame(col1,col2,col3,col4)

Upvotes: 1

Related Questions