Reputation: 101
I would like to prepare a table from raw text using readr::read_fwf. There is an argument col_position responsible for determining columns width which in my case could differ.
Table always includes 4 columns and is based on 4 first words from the string like besides one:
category variable description value sth
> text_for_column_width = "category variable description value sth"
> nchar("category ")
[1] 12
> nchar("variable ")
[1] 11
> nchar("description ")
[1] 17
> nchar("value ")
[1] 11
I want obtain 4 first words but keeping spaces to have category
with 8[a-b]+4[spaces] characters and finally create a vector including number of characters for each of four names c(12,11,17,11). I tried using strsplit with space split argument and then calculate existing zeros however I believe there is faster way just using proper regular expression.
Upvotes: 1
Views: 625
Reputation: 1925
You can also use this pattern:
stringr::str_split("category variable description value sth", "\\s+") %>%
unlist() %>%
purrr::map_int(nchar)
Upvotes: 0
Reputation: 626709
You can use utils::strcapture
:
text_for_column_width = "category variable description value sth"
pattern <- "^(\\S+\\s+)(\\S+\\s+)(\\S+\\s+)(\\S+\\s*)"
result <- utils::strcapture(pattern, text_for_column_width, list(f1 = character(), f2 = character(), f3 = character(), f4 = character()))
nchar(as.character(as.vector(result[1,])))
## => [1] 12 11 17 11
See the regex demo. The ^(\S+\s+)(\S+\s+)(\S+\s+)(\S+\s*)
matches
^
- start of string(\S+\s+)
- Group 1: one or more non-whitespace chars and then one or more whitespaces(\S+\s+)
- Group 2: one or more non-whitespace chars and then one or more whitespaces(\S+\s+)
- Group 3: one or more non-whitespace chars and then one or more whitespaces(\S+\s*)
- Group 4: one or more non-whitespace chars and then zero or more whitespacesUpvotes: 1
Reputation: 25323
A possible solution, using stringr
:
library(tidyverse)
text_for_column_width = "category variable description value sth"
strings <- text_for_column_width %>%
str_remove("sth$") %>%
str_split("(?<=\\s)(?=\\S)") %>%
unlist
strings
#> [1] "category " "variable " "description "
#> [4] "value "
strings %>% str_count
#> [1] 12 11 17 11
Upvotes: 4