Reputation: 11
I have just started to learn R programming.
Pls could you provide clarity on below question.
I have a file(XLS) that contains 1 column (with some keywords, i.e. data, data1, test, test1)
I have another file (XLS) that contains 2 columns
(Column 1 : ID1, ID2, ID3, ID4
Column 2 : data, data_analyst, test, test_analyst)
Now, how do i use pattern matching to get the output for all ID's that have a matching pattern and display the pattern name
eg. my output should be as follows :
ID1 : pattern matching (data)
ID2 : pattern not matching
ID3 : pattern matching (test)
ID4 : pattern not matching
Appreciate your response, as i am really confused
Upvotes: 1
Views: 194
Reputation: 1
Here is one way to accomplish your goal using str_like function form stringr
df <- tibble(x = c("id1","id2", "id3", "id4"),
y = c("data", "data_analyst", "test", "test_analyst"))
df2 <- tibble(z = c("data1", "data", "test1", "test")) %>%
arrange(z)
merged <- cbind(df,df2)
merged %>%
mutate(pattrn_match = ifelse(str_like(y, "data"), "pattern matching (data)",
ifelse(str_like(y, "test"), "pattern matching (test)", "pattern not matching" )))
### final output
x
<chr>
y
<chr>
z
<chr>
pattrn_match
<chr>
id1 data data pattern matching (data)
id2 data_analyst data1 pattern not matching
id3 test test pattern matching (test)
id4 test_analyst test1 pattern not matching
4 rows
Upvotes: 0
Reputation: 81
First step is to import the XLS files into R. This will import them as data frames, but may not have column names you expect. So you should also set the names to something you recognize.
file1 = read.xls("file1", header=TRUE)
file2 = read.xls("file2", header=TRUE)
names(file1) = c("DATA")
names(file2) = c("ID","DATA")
You would then do a merge based on DATA.
matched = merge(file1, file2, by="DATA")
At this point 'matched' includes all the rows that match. So you need to use the match function to find which ones in 'matched' match the ones in 'file1'.
a = match(file2$ID,matched$ID)
final = file2
names(final) = c("ID","MATCH")
final[which(is.na(a))],"DATA"] = "pattern does not match"
final[-which(is.na(a))],"DATA"] = "pattern matches"
Upvotes: 0