Reputation: 23
I'm trying to split uneven strings with multiple spaces. However the number of spaces to be split is not always the same number, e.g.
"abc 20"
"csd 10"
"eds 10 30"
and I'm trying to obtain the following:
"abc" " " "20"
"csd" "10" " "
"eds" "10" "30"
Any idea how to do this? Note that splitting based on a fixed number of spaces is not possible as these vary a bit. I was thinking about splitting on exactly one space either led by or followed by a character or a number, however I have no clue how to do that.
Upvotes: 2
Views: 363
Reputation: 11
I got another solution that saves the labor of counting the spaces :>
s_split = data.frame()
for (i in 1:nrow(df)){
s= df[i,1]
new_list = stringr::str_split_1(s,' ')
temp = as.data.frame(t(new_list[new_list !='']))
s_split= dplyr::bind_rows(s_split, temp )
}
s_split
Here is the toy data based on the posts above:
a = "abc 20"
b = "csd 10"
c = "eds 10 30"
df = as.data.frame(rbind(a,b,c))
Upvotes: 0
Reputation: 269634
1) read.fwf Try read.fwf
. Adjust the widths as needed.
s <- c("abc 20", "csd 10", "eds 10 30") # test data
read.fwf(textConnection(s), widths = c(3, 7, 7))
giving:
V1 V2 V3
1 abc NA 20
2 csd 10 NA
3 eds 10 30
2) kmeans This approach finds the starting columns, g
, of fields 2 and 3 and clusters them into two groups using kmeans
. It assumes that field 1 is always present since that seems to be the case in the question. Then if there are two fields on a line it assigns the second field to the group center that it is closest to.
km <- kmeans(unlist(gregexpr(" \\S", s)), 2)
centers <- sort(km$centers)
g <- gregexpr(" \\S", s)
spl <- strsplit(s, " +")
f <- function(s, g) {
if (length(s) == 2) paste0(s[1], strrep(",", which.min(abs(g - centers))), s[2])
else paste(s, collapse = ",")
}
read.table(text = mapply(f, spl, g), sep = ",", fill = TRUE, as.is = TRUE)
giving:
V1 V2 V3
1 abc NA 20
2 csd 10 NA
3 eds 10 30
Upvotes: 3