R split uneven strings with uneven number of spaces

Question

I'm trying to split uneven strings with multiple spaces. However the number of spaces to be split is not always the same number, e.g.

 "abc          20"
 "csd   10"
 "eds     10     30"

and I'm trying to obtain the following:

"abc" " " "20"
"csd" "10" " "
"eds" "10" "30"

Any idea how to do this? Note that splitting based on a fixed number of spaces is not possible as these vary a bit. I was thinking about splitting on exactly one space either led by or followed by a character or a number, however I have no clue how to do that.

G. Grothendieck · Accepted Answer

1) read.fwf Try read.fwf. Adjust the widths as needed.

s <- c("abc          20", "csd   10", "eds     10     30")  # test data
read.fwf(textConnection(s), widths = c(3, 7, 7))

giving:

   V1 V2 V3
1 abc NA 20
2 csd 10 NA
3 eds 10 30

2) kmeans This approach finds the starting columns, g, of fields 2 and 3 and clusters them into two groups using kmeans. It assumes that field 1 is always present since that seems to be the case in the question. Then if there are two fields on a line it assigns the second field to the group center that it is closest to.

km <- kmeans(unlist(gregexpr(" \S", s)), 2)
centers <- sort(km$centers)
g <- gregexpr(" \S", s)
spl <- strsplit(s, " +")
f <- function(s, g) {
  if (length(s) == 2) paste0(s[1], strrep(",", which.min(abs(g - centers))), s[2])
  else paste(s, collapse = ",")
}
read.table(text = mapply(f, spl, g), sep = ",", fill = TRUE, as.is = TRUE)

giving:

   V1 V2 V3
1 abc NA 20
2 csd 10 NA
3 eds 10 30

R split uneven strings with uneven number of spaces

Answers (2)

Related Questions