Rmeow
Rmeow

Reputation: 113

Extract only values with a decimal point in between from strings

I have a dataframe with strings such as:

id <- c(1,2)
x <- c("...14.....5.......................395.00.........................14.........1..",
   "......114.99....................124.99................")
df <- data.frame(id,x)
df$x <- as.character(df$x)

How can I extract only values with a decimal point in between such as 395.00, 114.99 and 124.99 and not 14, 5, or 1 for each row, and put them in a new column separated by a comma?

The ideal result would be:

  id            x2
  1         395.00
  2  114.99,124.99

The amount of periods separating the values are random.

Upvotes: 1

Views: 645

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 146070

library(stringr)
df$x2 = str_extract_all(df$x, "[0-9]+\\.[0-9]+")

df[c(1, 3)]
#   id             x2
# 1  1         395.00
# 2  2 114.99, 124.99

Explanation: [0-9]+ matches one or more numbers, \\. matches a single decimal point. str_extract_all extracts all matches.

The new column is a list column, not a string with an inserted comma. This allows you access to the individual elements, if needed:

df$x2[2]
# [[1]]
# [1] "114.99" "124.99"

If you prefer a character vector as the column, do this:

df$x3 = sapply(str_extract_all(df$x, "[0-9]+\\.[0-9]+"), paste, collapse = ",")

df$x3[2]
#[1] "114.99,124.99"

Upvotes: 2

Related Questions