Reputation: 31
I have a database as follow:
col1 <-c("X1",";X2","X3;X4;X5","X6;X7")
[1] "X1" ";X2" "X3;X4;X5" "X6;X7"
col2 = col3 = col4 <- c("5","8","4","1")
[1] "5" "8" "1" "4"
dat <- data.frame(col1,col2,col3,col4)
dat
col1 col2 col3 col4
1 X1 5 5 5
2 ;X2 8 8 8
3 X3;X4;X5 4 4 4
4 X6;X7 1 1 1
I want to remove semi colon in column 1 and if column 1 which contains over one X, it will duplicate new rows. Moreover, column 2 and 3 will duplicate values, column 4 will replace by value 0. I expect it will be:
dat
col1 col2 col3 col4
1 X1 5 5 5
2 X2 8 8 8
3 X3 4 4 4
4 X4 4 4 0
5 X5 4 4 0
6 X6 1 1 1
7 X7 1 1 0
Upvotes: 2
Views: 1976
Reputation: 269526
This splits col1
by semicolon and uses tidyr's unnest
to explode single rows into multiple rows as needed. Next it uses filter
to remove unwanted rows that were generated by leading (or trailing or multiple) semicolons. tidyr puts col1
last so we rearrange using select
. Alternately, if the column order does not matter just omit the select
.
library(dplyr)
library(tidyr)
dat %>%
mutate(col1 = strsplit(as.character(col1), ";")) %>%
unnest(col1) %>%
filter(col1 != "") %>%
select(col1, col2:col4)
giving:
Source: local data frame [7 x 4]
col1 col2 col3 col4
(chr) (fctr) (fctr) (fctr)
1 X1 5 5 5
2 X2 8 8 8
3 X3 4 4 4
4 X4 4 4 4
5 X5 4 4 4
6 X6 1 1 1
7 X7 1 1 1
Upvotes: 6
Reputation: 887078
We can use cSplit
library(splitstackshape)
cSplit(dat, "col1", ";", "long")[col1!=""][duplicated(col4), col4 := as.numeric(0)][]
# col1 col2 col3 col4
#1: X1 5 5 5
#2: X2 8 8 8
#3: X3 4 4 4
#4: X4 4 4 0
#5: X5 4 4 0
#6: X6 1 1 1
#7: X7 1 1 0
col2 = col3 = col4 <- c(5,8,4,1)
dat <- data.frame(col1,col2,col3,col4)
Upvotes: 1