Reputation: 2385
I would like to split rows that has strings separated by /
or ab
into new rows. Please see input and output examples below. Maybe we could use strsplit(d$values,split='/', fixed=TRUE)
, but how do you make them into new rows?
head
> head(d,3)
values ind
54 miR-197 9846
55 miR-197 9846
113 miR-221/222/222ab/1928 56204
dput
structure(list(values = c("miR-197", "miR-197", "miR-221/222/222ab/1928"
), ind = structure(c(4L, 4L, 6L), .Label = c("6482", "4057",
"60481", "9846", "7414", "56204", "84957", "29924", "6095", "8301",
"2355", "88455", "23047", "57590", "5829", "162", "3091", "9766",
"23406", "3646", "22870", "22898", "8775", "8178", "2077", "64115",
"6158", "5007", "8567", "10019", "26127", "4739", "6678", "27013",
"6146", "51060", "1997", "847", "4035", "79026", "8192", "5782",
"1032", "4354", "5791", "2752", "9873", "6386", "5962", "2230",
"6938", "6727", "7090", "92912", "55784", "409", "23521", "6279",
"51312", "7357", "2040", "2934", "9219", "2180", "219333", "114908",
"50807", "90268", "3098", "1974", "56990", "7791", "162989",
"9159", "7086", "51762", "9318", "23582", "10632", "54815", "1938",
"3576", "11214", "167227", "156", "2745", "6138", "391", "10933",
"5501", "3638", "2316", "2869", "10527", "255809"), class = "factor")), row.names = c(54L,
55L, 113L), class = "data.frame")
expected output
values ind
miR-197 9846
miR-197 9846
miR-221 56204
222a 56204
222b 56204
1928 56204
Upvotes: 3
Views: 60
Reputation: 525
Shamelessly stealing Ronak Shah's initial solution. Then splitting the 'ab'.
This solution is not generally applicable to all letters, so attention and modification would need to be used if trying to scale this to a larger problem with letters beyond "a" and "b".
First we add an index column to help with ordering, which will be removed at the end.
df <- tidyr::separate_rows(df, values, sep = "\\/")
df$index <- seq_len(nrow(df))
df_ab <- df[grep('ab',df$values),]
df$values <- gsub("ab","a",df$values)
df_ab$values <- gsub("ab","b",df_ab$values)
df <- rbind(df,df_ab)
df <- df[order(df$index),]
df$index <- NULL
## values ind
## 1 miR-197 9846
## 2 miR-197 9846
## 3 miR-221 56204
## 4 222 56204
## 5 222a 56204
## 51 222b 56204
## 6 1928 56204
Upvotes: 2