Stefan
Stefan

Reputation: 895

Regex - separate multiple words and whitespace from decimal numbers at the end

I have a string containing words, whitespace and numbers (integers and decimals). I want to separate them into two columns in a data frame so that column A contains the text and column B contains the number. It seems like a super simple task but I cannot figure out how to capture the text. I did capture the numbers though.

require(tidyr)
df <- data.frame(x = c("This is text0", "This is a bit more text 0.01", "Even more text12.231"))

Captured the number in column B but I cannot figure out how what regex to put in the first set of parentheses to get the text in A:

df |> 
  extract(x, c("A", "B"), "()(\\d+\\.*\\d*)")
#  A      B
#1        0
#2     0.01
#3   12.231

Upvotes: 2

Views: 47

Answers (3)

moodymudskipper
moodymudskipper

Reputation: 47350

With {unglue} you might do :

df <- data.frame(x = c("This is text0", "This is a bit more text 0.01", "Even more text12.231"))
unglue::unglue_unnest(df, x, "{A}{B=[0-9.]+}")
#>                          A      B
#> 1             This is text      0
#> 2 This is a bit more text    0.01
#> 3           Even more text 12.231

Created on 2022-11-24 with reprex v2.0.2

Upvotes: 2

akrun
akrun

Reputation: 887991

We capture one or more letters/space (([A-Za-z ]+)) followed by any space and the digits with . ([0-9.]+)

library(tidyr)
extract(df, x, into = c("A", "B"), "([A-Za-z ]+)\\s*([0-9.]+)", convert = TRUE)
                         A      B
1             This is text  0.000
2 This is a bit more text   0.010
3           Even more text 12.231

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627607

You can use

extract(x, c("A", "B"), "^(.*?)\\s*(\\d+(?:\\.\\d+)?)$")

See the regex demo

Details:

  • ^ - start of string
  • (.*?) - Group 1: any zero or more chars other than line break chars as few as possible
  • \s* - zero or more whitespaces
  • (\d+(?:\.\d+)?) - Group 2: one or more digits and then an optional sequence of . and one or more digits
  • $ - end of string

Upvotes: 2

Related Questions