Mohd Syazwan
Mohd Syazwan

Reputation: 51

Extract % from character column in R data frame

I have column seller_details in a data frame. An example of the data in the seller_details column is:

1. 8ysl9a1301 Active 3 hours ago chat now view shop Ratings10products996 response rate28% response
2. showcasemywardore Active 3 hours ago chat now view shop Ratings773products5k response rate70% response
3. zanzea.os Active 37 minutes ago chat now view shop Ratings290.5kproducts6.6k response rate93% response
4. airspacemy.os Active 14 minutes ago chat now view shop Ratings1.2kproducts2k response rate70% response
5. zanzea.os Active 37 minutes ago chat now view shop Ratings290.5kproducts6.6k response rate93% response

I want to extract the percentage of response rate only.

Expected output:

1. 28
2. 70
3. 93
4. 70
5. 93

Thank you

Data

structure(list(seller_details = c("8ysl9a1301 Active 3 hours ago chat now view shop Ratings10products996 response rate28% response", 
"showcasemywardore Active 3 hours ago chat now view shop Ratings773products5k response rate70% response", 
"zanzea.os Active 37 minutes ago chat now view shop Ratings290.5kproducts6.6k response rate93% response", 
"airspacemy.os Active 14 minutes ago chat now view shop Ratings1.2kproducts2k response rate70% response", 
"zanzea.os Active 37 minutes ago chat now view shop Ratings290.5kproducts6.6k response rate93% response"
)), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 1

Views: 57

Answers (2)

OJJ
OJJ

Reputation: 29

Admittedly this is verbose compared to the solution posted by Kathis S. But you can try using base R's gsub function.

gsub("(^\\d+)(.*)(\\d{2})(%)(\\s[a-z]*)", "\\1 \\3", seller_details, perl = TRUE)

The parenthesis act as groups.

Group one:  (^\\d)
Group two: (.*)
Group three: (\\d{2})
Group four (%)
Group five (\\s[a-z]*)

The string to replace is \1 \3, which refer to the group numbers as stated above. You want to return group 1. A number, one time at the start of the string. And group 3, a number with two digits, which is followed by a %.

Group five is necessary to capture the end of the string, in your case, the word "response".

There are probably cleaner ways to do this, but it will work nonetheless.

Upvotes: 1

Karthik S
Karthik S

Reputation: 11584

Does this work:

library(dplyr)
library(stringr)
df %>% mutate(percent = str_extract(col, '(?<=rate)\\d{2,3}'))
                                                                                                     col percent
1        8ysl9a1301 Active 3 hours ago chat now view shop Ratings10products996 response rate28% response      28
2 showcasemywardore Active 3 hours ago chat now view shop Ratings773products5k response rate70% response      70
3 zanzea.os Active 37 minutes ago chat now view shop Ratings290.5kproducts6.6k response rate93% response      93
4 airspacemy.os Active 14 minutes ago chat now view shop Ratings1.2kproducts2k response rate70% response      70
5 zanzea.os Active 37 minutes ago chat now view shop Ratings290.5kproducts6.6k response rate93% response      93

Data used:

df
                                                                                                     col
1        8ysl9a1301 Active 3 hours ago chat now view shop Ratings10products996 response rate28% response
2 showcasemywardore Active 3 hours ago chat now view shop Ratings773products5k response rate70% response
3 zanzea.os Active 37 minutes ago chat now view shop Ratings290.5kproducts6.6k response rate93% response
4 airspacemy.os Active 14 minutes ago chat now view shop Ratings1.2kproducts2k response rate70% response
5 zanzea.os Active 37 minutes ago chat now view shop Ratings290.5kproducts6.6k response rate93% response

Upvotes: 3

Related Questions