Reputation: 263
I have a string matrix which contains sector specific revenue contribution of certain companies. I have to extract a matrix which only contains revenue from software. The matrix is as under:
revenue <- data.frame(revenue = c("79% Software, 1% Hardware, 20% Services", NA, NA, "10.5% Software, 90% Services", "1.4% Software, 98.6% Services", "17% Software, 83% Services", NA, "100% Services", "47% Services, 39% Hardware, 14.32% Software"))
I want to provide ending pattern as "software" and then extract left to get % which extract the number(whether it is decimal or numeric).
My solution is working but it's quiet lengthy. How can I extract the matrix in single line.
EDIT
As asked by by @SabDem in comment,
My code:
library("stringr")
revenue= as.matrix(revenue)
rs <- str_split_fixed(revenue,',',3)
rs1<- matrix(0,nrow(rs), ncol(rs))
for(i in 1:nrow(rs)){
for(j in 1:ncol(rs)){
ifelse(grep('Software',rs[i,j])==TRUE,(rs1[i,j]=rs[i,j]),(rs1[i,j]=0))
}
}
rs2 <- gsub('Software|%','',rs1)
soft.revenue <- rowSums(data.matrix(data.frame(rs2, stringsAsFactors = FALSE)))
Upvotes: 0
Views: 101
Reputation: 405
I would use the stringr library. For your example it would be:
library("stringr")
revenue <- data.frame(revenue = c("79% Software, 1% Hardware, 20% Services", NA, NA, "10.5% Software, 90% Services", "1.4% Software, 98.6% Services", "17% Software, 83% Services", NA, "100% Services", "47% Services, 39% Hardware, 14.32% Software"))
pattern <- "(([[:digit:]]|.[[:digit:]]+)*)(?=% Software)"
as.numeric(str_extract(revenue$revenue,pattern))
The core idea is the expression (?=% Software)
which looks ahead until it finds the string % Software
. Variable length look behind is (as far as I know) not possible in R.
Upvotes: 1