sotnik
sotnik

Reputation: 119

Extract all percentage numbers from a data frame column

I have a data.frame df with a character column text that contains text. From that column, I would like to extract all percentage numbers (say, 1.2% and -2.3%) but not the ordinary numbers (say, 123 and 1.2) into a character vector.

A small example:

df <- data.frame(text = c("this text is 1.3% this is +1.4% and this -1.5%",
                          "this text is 123.3% this 123.3 and this 1234.5"))

Required output:

[1] "1.3%" "-1.4%"  "-1.5%" "123.3%"

Is that possible?

Upvotes: 2

Views: 123

Answers (2)

Zheyuan Li
Zheyuan Li

Reputation: 73275

Probably not the most robust general-purpose solution, but works for your example:

unlist(stringr::str_extract_all(df$text, "[+\\-]?[0-9\\.]+%"))
#[1] "1.3%"   "+1.4%"  "-1.5%"  "123.3%"

## or using R's native forward pipe operator, since R 4.1.0
stringr::str_extract_all(df$text, "[+\\-]?[0-9\\.]+%") |> unlist()
#[1] "1.3%"   "+1.4%"  "-1.5%"  "123.3%"

This meets your expected output (i.e., a character vector). But in case you are thinking about storing the results to a new data frame column, you don't really want to unlist(). Just do:

df$percentages <- stringr::str_extract_all(df$text, "[+\\-]?[0-9\\.]+%")
df
#                                            text        percentages
#1 this text is 1.3% this is +1.4% and this -1.5% 1.3%, +1.4%, -1.5%
#2 this text is 123.3% this 123.3 and this 1234.5             123.3%

The new column percentages itself is a list:

str(df$percentages)
#List of 2
# $ : chr [1:3] "1.3%" "+1.4%" "-1.5%"
# $ : chr "123.3%"

Upvotes: 3

TarJae
TarJae

Reputation: 78927

Here is an alternative tidyverse way:

First we extract the numbers with parse_number from readr package,and then within an ifelse statement we specify the combination of number and percent. Finally pull for vector output.

library(tidyverse)

df %>% 
  mutate(x = parse_number(text),
         x = ifelse(str_detect(text, "%"), paste0(x,"%"), NA_character_)) %>% 
  pull(x)
1] "1.3%"   "1.4%"   "-1.5%"  "123.3%" NA       NA    

Upvotes: 3

Related Questions