Ashish25
Ashish25

Reputation: 2023

In R, How to manipulate character string using gsub() and perform multivariate data cleaning efficiently?

I’m rookie at R programming and I'm stuck with character string manipulation, If some one can comment with the code, I’d appreciate it. The data-frame has top Forbes’ 100 brands. I want to clean a specific column ‘Company Advertising’ as (Check attached screenshot)

Forbes' Top 100 comapnies with their Advertising expenditure cloumn

Resulting column will look like.. Before: 1.2 B, 2.3 B, 3 B, 808 M

After: 1.2, 2.3, 3, 0.808

Upvotes: 0

Views: 194

Answers (2)

akrun
akrun

Reputation: 887681

gsubfn is perfect for this task:

library(gsubfn)
as.vector(sapply(gsubfn("[A-Z]", list(B="* 1", M= "* 1e-3"), x), 
                                      function(x) eval(parse(text=x))))
#[1] 1.200 2.500 0.808

data

x <- c("1.2 B", "2.5 B", "808 M")

Upvotes: 1

Alexey Ferapontov
Alexey Ferapontov

Reputation: 5169

x = c("1.2 B", "2.5 B", "808 M")

y = ifelse(grepl("B",x) == T, as.numeric(gsub("\\s{1,10}B$","",x)),as.numeric(gsub("\\s{1,10}M$","",x))/1000)
View(y)


    x
1   1.200
2   2.500
3   0.808

Upvotes: 0

Related Questions