b1geyedeer
b1geyedeer

Reputation: 101

String Splits with Regex

I need help with regex to split up strings in my log line. Log message as follows:

Token1|Token2|Token3|Token4|Token5|Token6|Token7|key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: /::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\330\302\037\262\220\333J;\242.\031z0x\334\177L keyType=web

Given the following:

message = "Token1|Token2|Token3|Token4|Token5|Token6|Token7|key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\\330\\302\\037\\262\\220\\333J;\\242.\\031z0x\\334\\177L keyType=web"
hash = {}
hash['T1'], hash['T2'], hash['T3'], hash['T4'], hash['T5'], hash['T6'], hash['T7'], message = message.split /(?<!\\)[\|]/

It splits the keyUrl string yielding the following in message truncating the payload and subsequent key values in the log file:

key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: /::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~

I've been trying various permutations of regex patterns but stumped and was wondering if anyone could help with a better pattern than message.split /(?<!\\)[\|]/ Many thanks.

EditedThe result that I'm aiming for is:

puts hash
{"T1"=>"Token1","T2"=>"Token2","T3"=>"Token3","T4"=>"Token4","T5"=>"Token5","T6"=>"Token6","T7"=>"Token7",}

puts message
key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\\330\\302\\037\\262\\220\\333J;\\242.\\031z0x\\334\\177L keyType=web

Hope that helps to clarify and many thanks again for the attempts to assist.

Upvotes: 1

Views: 80

Answers (2)

seph
seph

Reputation: 6076

Looks like you're just doing a split on |. You can just do this: ...split("|"). You can gather the remaining bits like this:

...,hash['T7'], *messages = message.split("|")
messages
=> ["key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~", "\\330\\302\\037\\262\\220\\333J;\\242.\\031z0x\\334\\177L keyType=web"]

If want the whole string after Token7| you can join them together like this:

message = messages.join("|")
=> "key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\\330\\302\\037\\262\\220\\333J;\\242.\\031z0x\\334\\177L keyType=web"

EDIT: now if you print it out

puts message

you get:

key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\330\302\037\262\220\333J;\242.\031z0x\334\177L keyType=web

Upvotes: 1

Cary Swoveland
Cary Swoveland

Reputation: 110755

Here is your example text simplified somewhat:

text = "Token1|Token2|Token3|key1=abc key2=89042683 no-cache::~~::~~|\\330"

It is my understanding that you want the following:

hash = {}
hash['T1'], hash['T2'], hash['T3'], message = text.split('|')
  #=> ["Token1", "Token2", "Token3", "key1=abc key2=89042683 no-cache::~~::~~"] 
hash
  #=> {"T1"=>"Token1", "T2"=>"Token2", "T3"=>"Token3"} 
message
  #=> "key1=abc key2=89042683 no-cache::~~::~~" 

Please let me know if my assumption is incorrect.

Edit: In view of your comment, is it more than:

hash['T1'], hash['T2'], hash['T3'], message, keyurl = text.split('|')
  #=> ["Token1", "Token2", "Token3",
  #    "key1=abc key2=89042683 no-cache::~~::~~", "\\330"] 

or

hash['T1'], hash['T2'], hash['T3'], *messages = text.split('|')
  #=> ["Token1", "Token2", "Token3",
  #    "key1=abc key2=89042683 no-cache::~~::~~", "\\330"]
messages
  #=> ["key1=abc key2=89042683 no-cache::~~::~~", "\\330"]

that you want?

Upvotes: 0

Related Questions