atsang01
atsang01

Reputation: 35

How to search part of string that contain in a list of string, and return the matched one in R

The following data frame contain a "Campaign" column, the value of column contain information about season, name, and position, however, the order of these information are quiet different in each row. Lucky, these information is a fixed list, so we could create a vector to match the string inside the "Campaign_name" column.

   Date           Campaign
1 Jan-15   Summer|Peter|Up
2 Feb-15 David|Winter|Down
3 Mar-15   Up|Peter|Spring

Here is what I want to do, I want to create 3 columns as Name, Season, Position. So these column can search the string inside the campaign column and return the matched value from the list below.

Name <- c("Peter, David")
Season <- c("Summer","Spring","Autumn", "Winter")
Position <- c("Up","Down")

So my desired result would be following

Temp
    Date          Campaign  Name Season Position
1 15-Jan   Summer|Peter|Up Peter Summer       Up
2 15-Feb David|Winter|Down David Winter     Down
3 15-Mar   Up|Peter|Spring Peter Spring       Up

Upvotes: 2

Views: 105

Answers (3)

Jonathan Carroll
Jonathan Carroll

Reputation: 3947

I had the same idea as Marat Talipov; here's a data.table option:

library(data.table)

Name     <- c("Peter", "David")
Season   <- c("Summer","Spring","Autumn", "Winter")
Position <- c("Up","Down")

dat <- data.table(Date=c("Jan-15", "Feb-15", "Mar-15"),
                  Campaign=c("Summer|Peter|Up", "David|Winter|Down", "Up|Peter|Spring"))

Gives

> dat
 Date          Campaign
1: Jan-15   Summer|Peter|Up
2: Feb-15 David|Winter|Down
3: Mar-15   Up|Peter|Spring

Processing is then

dat[ , `:=`(Name     = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Name),
            Season   = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Season),
            Position = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Position))
    ]

Result:

> dat
     Date          Campaign  Name Season Position
1: Jan-15   Summer|Peter|Up Peter Summer       Up
2: Feb-15 David|Winter|Down David Winter     Down
3: Mar-15   Up|Peter|Spring Peter Spring       Up

Maybe there's some benefit if you're doing this to a lot of columns or need to modify in place (by reference).

I'm interested if anyone can show me how to update all three columns at once.

EDIT: Never mind, figured it out;

for (icol in c("Name", "Season", "Position")) 
    dat[, (icol):=sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, get(icol))]

Upvotes: 2

Marat Talipov
Marat Talipov

Reputation: 13304

Another way:

L <- strsplit(df$Campaign,split = '\\|')

df$Name <- sapply(L,intersect,Name)
df$Season <- sapply(L,intersect,Season)
df$Position <- sapply(L,intersect,Position)

Upvotes: 3

R. Schifini
R. Schifini

Reputation: 9313

Do the following:

Date = c("Jan-15","Feb-15","Mar-15")
Campaign = c("Summer|Peter|Up","David|Winter|Down","Up|Peter|Spring")
df = data.frame(Date,Campaign)

Name <- c("Peter", "David")
Season <- c("Summer","Spring","Autumn", "Winter")
Position <- c("Up","Down")

for(k in Name){
    df$Name[grepl(pattern = k, x = df$Campaign)] <- k
}

for(k in Season){
    df$Season[grepl(pattern = k, x = df$Campaign)] <- k
}

for(k in Position){
    df$Position[grepl(pattern = k, x = df$Campaign)] <- k
}

This gives:

> df
    Date          Campaign  Name Season Position
1 Jan-15   Summer|Peter|Up Peter Summer       Up
2 Feb-15 David|Winter|Down David Winter     Down
3 Mar-15   Up|Peter|Spring Peter Spring       Up

Upvotes: 2

Related Questions