Reputation: 22681
I am trying to parse a text like this from a log file:
[2016-01-29 11:31:33,809: WARNING/Worker-1283] 1030140:::DEAL_OF_DAY:::29:::1:::11 [2016-01-29 11:31:34,103: WARNING/Worker-1197] 1025311:::DEAL_OF_DAY:::29:::1:::11 [2016-01-29 11:31:34,291: WARNING/Worker-1197] 1025158:::DEAL_OF_DAY:::29:::1:::11
I want to extract these numbers 1030140, 1025311, 1025158 and so on.
I have tried the following
cat deals29.txt | egrep -o '[0-9]+'
But this gives other digits as well
I tried
cat deals29.txt | egrep -o ' [0-9]+:::'
but now it gives the colons in the output as well and there is no way to capture the group in the command line version of grep.
Any suggestions? grep
solution would be preferred but I can go with sed/awk as well if grep cannot do the job.
Upvotes: 1
Views: 60
Reputation: 43169
You could use a solution like:
(\d{3,})::
# looks for at least 3 digits (or more) followed by two colons
# puts the matched numbers in group 1
See a demo for this approach here.
Upvotes: 0
Reputation: 785146
Using grep -oP
and match reset \K
:
grep -oP '^\[.*?\] \K\d+' file.log
1030140
1025311
1025158
If your grep
doesn't support -P
(PCRE) then use awk
:
awk -F '\\] |:::' '{print $2}' file.log
1030140
1025311
1025158
Upvotes: 2
Reputation: 19
You can train regex here : https://regex101.com/
I get
] [0-9]*
and you have to delete the first 2 chars
Upvotes: 0