user3919790
user3919790

Reputation: 557

Extracting matched words from a string

I have a database structure - abbreviated version below

structure(list(sex1 = c("totalmaleglobal", "totalfemaleglobal", 
"totalglobal", "totalfemaleGSK", "totalfemaleglobal", 
"totalfemaleUN")), .Names = "sex1", row.names = c(NA, 6L),
class="data.frame")

I want to extract the words 'total', 'totalmale', 'totalfemale'

How do do this?

I tried regex with the following code

pattern1=c("total")
pattern2=c("totalmale")
pattern3=c("totalfemale")

daly$sex <- str_extract(daly$sex1,pattern1)
daly$sex <- str_extract(daly$sex1,pattern2)
daly$sex <- str_extract(daly$sex1,pattern3)

But its giving me NA.

Upvotes: 5

Views: 203

Answers (4)

989
989

Reputation: 12935

We could also do sapply and grepl (in base R) over the wanted patterns (s1 vector) as follows:

x <- sapply(s1,function(x) grepl(x, d1$sex1))
colnames(x)[max.col(x, ties.method = "first")]

# [1] "totalmale" "totalfemale" "total" "totalfemale" "totalfemale" "totalfemale"

where

s1 <- c("totalmale", "totalfemale", "total")

Upvotes: 0

Serge Warde
Serge Warde

Reputation: 19

try this:

test = structure(list(sex1 = c("totalmaleglobal", "totalfemaleglobal", 
                    "totalglobal", "totalfemaleGSK", "totalfemaleglobal", 
                    "totalfemaleUN")), .Names = "sex1", row.names = c(NA, 6L),
                    class="data.frame")

total = grep("total", test[[1]], perl=TRUE, value=TRUE)
totalmale = grep("totalmale", test[[1]], perl=TRUE, value=TRUE)
totalfemale = grep("totalfemale", test[[1]], perl=TRUE, value=TRUE)

print(total)
print(totalmale)
print(totalfemale)

Upvotes: 1

Sotos
Sotos

Reputation: 51592

Two steps with gsub,

v2 <- gsub(paste(v1, collapse='|'), '', d1$sex1)

gsub(paste(v2, collapse='|'), '', d1$sex1)
#[1] "totalmale"   "totalfemale" "total"       "totalfemale" "totalfemale" "totalfemale"

where

v1 <- c('total', 'totalmale', 'totalfemale')

Upvotes: 2

lukeA
lukeA

Reputation: 54287

Maybe

library(stringr)
daly$sex <- str_extract(daly$sex1,paste(rev(mget(ls(pattern = "pattern\\d+"))), collapse="|"))
daly
#                sex1         sex
# 1   totalmaleglobal   totalmale
# 2 totalfemaleglobal totalfemale
# 3       totalglobal       total
# 4    totalfemaleGSK totalfemale
# 5 totalfemaleglobal totalfemale
# 6     totalfemaleUN totalfemale

Upvotes: 2

Related Questions