makoLP
makoLP

Reputation: 23

How to replace string by its own part

I have one column in data.table in R which looks like this.

[1] "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_RESULT\",\"SK190400\",
[2] "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\",
[3] "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\",
[4] "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"OEE_DATA\",
[5] "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"PING\",\"SK190400\",

But only thing that I care about is whether it is "UNIT_RESULT", "UNIT_CHECKIN", "OEE_DATA" or "PING", so I would like to replace each of row by new string ("UNIT_RESULT" etc.)

Result should looks like:

[1] "UNIT_RESULT"
[2] "UNIT_CHECKIN"
[3] "UNIT_CHECKIN"
[4] "OEE_DATA"
[5] "PING"

I have spent many hours by trying to find how to replace string by its own part but nothing showed me any useful result.

Replace specific characters within strings

Reference - What does this regex mean?

Test if characters in string in R

In the beginning function substring(x, 53, 63) looks like solution for me but it just choose fixed symbols in string so unless I have all rows same it is useless.

Any hints?

Upvotes: 1

Views: 93

Answers (4)

Stan
Stan

Reputation: 995

If you do not have a finite list of strings you are searching for I would recommend using a reg-ex pattern. Here is one that works based on the examples you provided:

# Code to create example data.table
library(data.table)

dt <- data.table(f1 =  c("<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_RESULT\",\"SK190400\"",
                     "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\"",
                     "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\"",
                     "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"OEE_DATA\"",
                     "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"PING\",\"SK190400\""
))

# Start of code to parse out values:
rex_pattern <- "(?<=(\"))[A-Z]{2,}_*[A-Z]+(?=(\"))"

dt[, .(parsed_val = regmatches(f1, regexpr(pattern = rex_pattern, f1, perl = TRUE)))]   

This gives you:

     parsed_val
1:  UNIT_RESULT
2: UNIT_CHECKIN
3: UNIT_CHECKIN
4:     OEE_DATA
5:         PING 

If you really want to "overwrite" the original field f1 with the new substring, you can use the following:

dt[, `:=`(f1 = regmatches(f1, regexpr(pattern = rex_pattern, f1, perl = TRUE)))]

Upvotes: 0

divibisan
divibisan

Reputation: 12155

The str_match_all function will apply a regex to each element of a vector of strings and return only the match. So we can make a list of all the terms we want to extract and use paste0 to join them together with the | OR operator to make a single regular expression that matches any of the 4 desired terms.

Then we just run the str_match_all function and unlist the resulting list into a character vector.

strings <- c("<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_RESULT\",\"SK190400\"",
             "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\"",
             "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"UNIT_CHECKIN\",\"SK190400\"",
             "=> MSG: 'MessageReq', BODY: '{\"MessageReq\":{\"Parameters\":[\"OEE_DATA\"",
             "<= MSG: 'ACK', BODY: '{\"MessageRep\":{\"Parameters\":[\"PING\",\"SK190400\""
)

items <- c('UNIT_RESULT', 'UNIT_CHECKIN', 'OEE_DATA', 'PING')

library(stringr)
unlist(str_match_all(strings, paste0(items,collapse = '|')))
[1] "UNIT_RESULT"  "UNIT_CHECKIN" "UNIT_CHECKIN" "OEE_DATA"     "PING"        

Upvotes: 1

Nicolas2
Nicolas2

Reputation: 2210

I suggest

gsub("^.*?(UNIT_RESULT|UNIT_CHECKIN|OEE_DATA|PING).*","\\1",strings,perl=TRUE)

Upvotes: 0

Luis
Luis

Reputation: 639

An alternative is to use str_extract. You pass your string as the 'string' argument and the alternatives you gave as the 'pattern' argument, and it will return whatever of your alternatives is the first one to appear in the string.

library(stringr)

DT[, newstring := str_extract(string_column, "UNIT_RESULT|UNIT_CHECKIN|OEE_DATA|PING")]

Upvotes: 0

Related Questions