Reputation: 1112
I have a data frame M
. I would like to extract the first part of each string separated by ":"
. I used strsplit
but the result is a large character not a data frame. Could someone please help with this?
M <- read.table(text=
"1/1:205,54,0:18:0:57 1/1:141,39,0:13:0:42 0/0:0,54,255:18:0:45 1/1:174,48,0:16:0:51 0/0:0,84,255:28:0:75
0/0:0,78,255:26:0:99 0/0:0,63,255:21:0:86 0/0:0,45,255:15:0:68 0/0:0,48,255:16:0:71 0/0:0,132,255:44:0:99
0/0:0,78,255:26:0:89 0/0:0,78,255:26:0:89 0/0:0,36,255:12:0:47 0/0:0,33,255:11:0:44 0/0:0,108,255:36:0:99
0/0:0,75,255:25:0:99 0/0:0,54,255:18:0:78 0/0:0,69,255:23:0:93 0/0:0,33,255:11:0:57 0/0:0,96,255:32:0:99
0/0:0,60,75:21:0:74 0/0:0,51,84:17:0:65 0/0:0,48,64:17:0:62 0/0:0,42,65:15:0:56 0/0:0,84,99:28:0:98 ",
head=F, stringsAsFactors=F)
S <- sapply(strsplit(M, ":"), "[", 1)
Upvotes: 2
Views: 2174
Reputation: 887128
It may not be best to use strsplit
as we are only interested in a substring. Assuming that the OP is interested in understanding how strsplit
can be used for this example dataset, a modification of the OP's code would be to use a nested lapply/sapply
loop.
M[] <- lapply(M, function(x) sapply(strsplit(as.character(x), ':'),'[',1))
M
# V1 V2 V3 V4 V5
#1 1/1 1/1 0/0 1/1 0/0
#2 0/0 0/0 0/0 0/0 0/0
#3 0/0 0/0 0/0 0/0 0/0
#4 0/0 0/0 0/0 0/0 0/0
#5 0/0 0/0 0/0 0/0 0/0
Or as the columns are all similar, we can unlist
, use strsplit
and assign the original dataset with the output so that we can keep the original structure intact for the output we got.
M[] <- sapply(strsplit(unlist(M), ':'),'[',1)
Or a faster option would be using stri_extract_first
from stringi
to extract the the characters that are not :
.
library(stringi)
M[] <- stri_extract_first(unlist(M), regex='[^:]+')
Upvotes: 5
Reputation: 21621
Try:
dplyr::mutate_each(M, funs(sub("(.*?)(:.*)", "\\1" , .)))
Which gives:
# V1 V2 V3 V4 V5
#1 1/1 1/1 0/0 1/1 0/0
#2 0/0 0/0 0/0 0/0 0/0
#3 0/0 0/0 0/0 0/0 0/0
#4 0/0 0/0 0/0 0/0 0/0
#5 0/0 0/0 0/0 0/0 0/0
Upvotes: 4
Reputation: 99331
You can use sub()
M[] <- lapply(M, sub, pattern = ":.*", replacement = "")
M
# V1 V2 V3 V4 V5
# 1 1/1 1/1 0/0 1/1 0/0
# 2 0/0 0/0 0/0 0/0 0/0
# 3 0/0 0/0 0/0 0/0 0/0
# 4 0/0 0/0 0/0 0/0 0/0
# 5 0/0 0/0 0/0 0/0 0/0
The above will overwrite the original M
data. If you do not wish to overwrite M
, assign it to a new variable name first or just use as.data.frame()
around lapply()
as.data.frame(lapply(M, sub, pattern = ":.*", replacement = ""))
Upvotes: 4