Reputation: 101
I need help with regex to split up strings in my log line. Log message as follows:
Token1|Token2|Token3|Token4|Token5|Token6|Token7|key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: /::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\330\302\037\262\220\333J;\242.\031z0x\334\177L keyType=web
Given the following:
message = "Token1|Token2|Token3|Token4|Token5|Token6|Token7|key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\\330\\302\\037\\262\\220\\333J;\\242.\\031z0x\\334\\177L keyType=web"
hash = {}
hash['T1'], hash['T2'], hash['T3'], hash['T4'], hash['T5'], hash['T6'], hash['T7'], message = message.split /(?<!\\)[\|]/
It splits the keyUrl string yielding the following in message
truncating the payload and subsequent key values in the log file:
key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: /::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~
I've been trying various permutations of regex patterns but stumped and was wondering if anyone could help with a better pattern than message.split /(?<!\\)[\|]/
Many thanks.
EditedThe result that I'm aiming for is:
puts hash
{"T1"=>"Token1","T2"=>"Token2","T3"=>"Token3","T4"=>"Token4","T5"=>"Token5","T6"=>"Token6","T7"=>"Token7",}
puts message
key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\\330\\302\\037\\262\\220\\333J;\\242.\\031z0x\\334\\177L keyType=web
Hope that helps to clarify and many thanks again for the attempts to assist.
Upvotes: 1
Views: 80
Reputation: 6076
Looks like you're just doing a split on |. You can just do this: ...split("|")
. You can gather the remaining bits like this:
...,hash['T7'], *messages = message.split("|")
messages
=> ["key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~", "\\330\\302\\037\\262\\220\\333J;\\242.\\031z0x\\334\\177L keyType=web"]
If want the whole string after Token7| you can join them together like this:
message = messages.join("|")
=> "key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\\330\\302\\037\\262\\220\\333J;\\242.\\031z0x\\334\\177L keyType=web"
EDIT: now if you print it out
puts message
you get:
key1=abc key2=89042683 keytransport=tcp keyUrl=POST /b/opt/ HTTP/1.1::~~Accept: */*::~~Content-Type: application/octet-stream::~~Connection: Close::~~User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Win64; x64; Trident/6.0)::~~Host: something.su::~~Content-Length: 185::~~Cache-Control: no-cache::~~::~~|\330\302\037\262\220\333J;\242.\031z0x\334\177L keyType=web
Upvotes: 1
Reputation: 110755
Here is your example text simplified somewhat:
text = "Token1|Token2|Token3|key1=abc key2=89042683 no-cache::~~::~~|\\330"
It is my understanding that you want the following:
hash = {}
hash['T1'], hash['T2'], hash['T3'], message = text.split('|')
#=> ["Token1", "Token2", "Token3", "key1=abc key2=89042683 no-cache::~~::~~"]
hash
#=> {"T1"=>"Token1", "T2"=>"Token2", "T3"=>"Token3"}
message
#=> "key1=abc key2=89042683 no-cache::~~::~~"
Please let me know if my assumption is incorrect.
Edit: In view of your comment, is it more than:
hash['T1'], hash['T2'], hash['T3'], message, keyurl = text.split('|')
#=> ["Token1", "Token2", "Token3",
# "key1=abc key2=89042683 no-cache::~~::~~", "\\330"]
or
hash['T1'], hash['T2'], hash['T3'], *messages = text.split('|')
#=> ["Token1", "Token2", "Token3",
# "key1=abc key2=89042683 no-cache::~~::~~", "\\330"]
messages
#=> ["key1=abc key2=89042683 no-cache::~~::~~", "\\330"]
that you want?
Upvotes: 0