Reputation: 895
I have a string containing words, whitespace and numbers (integers and decimals). I want to separate them into two columns in a data frame so that column A
contains the text and column B
contains the number. It seems like a super simple task but I cannot figure out how to capture the text. I did capture the numbers though.
require(tidyr)
df <- data.frame(x = c("This is text0", "This is a bit more text 0.01", "Even more text12.231"))
Captured the number in column B
but I cannot figure out how what regex to put in the first set of parentheses to get the text in A
:
df |>
extract(x, c("A", "B"), "()(\\d+\\.*\\d*)")
# A B
#1 0
#2 0.01
#3 12.231
Upvotes: 2
Views: 47
Reputation: 47350
With {unglue} you might do :
df <- data.frame(x = c("This is text0", "This is a bit more text 0.01", "Even more text12.231"))
unglue::unglue_unnest(df, x, "{A}{B=[0-9.]+}")
#> A B
#> 1 This is text 0
#> 2 This is a bit more text 0.01
#> 3 Even more text 12.231
Created on 2022-11-24 with reprex v2.0.2
Upvotes: 2
Reputation: 887991
We capture one or more letters/space (([A-Za-z ]+)
) followed by any space and the digits with . ([0-9.]+
)
library(tidyr)
extract(df, x, into = c("A", "B"), "([A-Za-z ]+)\\s*([0-9.]+)", convert = TRUE)
A B
1 This is text 0.000
2 This is a bit more text 0.010
3 Even more text 12.231
Upvotes: 2
Reputation: 627607
You can use
extract(x, c("A", "B"), "^(.*?)\\s*(\\d+(?:\\.\\d+)?)$")
See the regex demo
Details:
^
- start of string(.*?)
- Group 1: any zero or more chars other than line break chars as few as possible\s*
- zero or more whitespaces(\d+(?:\.\d+)?)
- Group 2: one or more digits and then an optional sequence of .
and one or more digits$
- end of stringUpvotes: 2