Quality Analyst
Quality Analyst

Reputation: 11

R programming - Pattern matching (exact word only)

I have just started to learn R programming.

Pls could you provide clarity on below question.

I have a file(XLS) that contains 1 column (with some keywords, i.e. data, data1, test, test1)

I have another file (XLS) that contains 2 columns

(Column 1 : ID1, ID2, ID3, ID4
Column 2 : data, data_analyst, test, test_analyst) 

Now, how do i use pattern matching to get the output for all ID's that have a matching pattern and display the pattern name

eg. my output should be as follows :

ID1 : pattern matching (data) 
ID2 : pattern not matching 
ID3 : pattern matching (test)
ID4 : pattern not matching

Appreciate your response, as i am really confused

Upvotes: 1

Views: 194

Answers (2)

user28876813
user28876813

Reputation: 1

Here is one way to accomplish your goal using str_like function form stringr

df <- tibble(x = c("id1","id2", "id3", "id4"),
             y  = c("data", "data_analyst", "test", "test_analyst"))

df2 <- tibble(z = c("data1", "data", "test1", "test")) %>% 
  arrange(z)

merged <- cbind(df,df2)

merged %>% 
  mutate(pattrn_match = ifelse(str_like(y, "data"),  "pattern matching (data)", 
                               ifelse(str_like(y, "test"), "pattern matching (test)", "pattern not matching" )))

### final output 
x
<chr>
y
<chr>
z
<chr>
pattrn_match
<chr>
id1 data    data    pattern matching (data) 
id2 data_analyst    data1   pattern not matching    
id3 test    test    pattern matching (test) 
id4 test_analyst    test1   pattern not matching    
4 rows

Upvotes: 0

Judu Le
Judu Le

Reputation: 81

First step is to import the XLS files into R. This will import them as data frames, but may not have column names you expect. So you should also set the names to something you recognize.

file1 = read.xls("file1", header=TRUE)
file2 = read.xls("file2", header=TRUE)
names(file1) = c("DATA")
names(file2) = c("ID","DATA")

You would then do a merge based on DATA.

matched = merge(file1, file2, by="DATA")

At this point 'matched' includes all the rows that match. So you need to use the match function to find which ones in 'matched' match the ones in 'file1'.

a = match(file2$ID,matched$ID)
final = file2
names(final) = c("ID","MATCH")
final[which(is.na(a))],"DATA"] = "pattern does not match"
final[-which(is.na(a))],"DATA"] = "pattern matches"

Upvotes: 0

Related Questions