Hemanth
Hemanth

Reputation: 161

Extract lines between two expressions of a file inside bash script (using regexp, sed)

I've a log file with many lines, I've to extract lines from session start to session end using a bash script, for further analysis.

...
...

## TSM-INSTALL SESSION (pid) started at yyyy/mm/dd hh:mm:ss for host (variable) ##
...
...
...
...
...
...
...
## TSM-INSTALL SESSION (pid) ended at yyyy/mm/dd hh:mm:ss for host (variable) ##

...
...

I've googled and found a sed expression to extract the lines

sed '/start_pattern_here/,/end_pattern_here/!d' inputfile

But I'm unable to find a correct reg expression pattern to extract the info.

I'm pretty novice to reg exp. I'm also adding all the expressions (silly ones too) I've tried inside the script.

sed '/\.* started at \.* $server ##/,/\.* ended at \.* $server ##/!d' file

sed '/## TSM-INSTALL SESSION [0-9]\+ started at [0-9|\\|:]\+ for host $server ##/,/## TSM-INSTALL SESSION [0-9]\+ ended at [0-9|\\|:]\+ for host $server ##/!d' file

sed '/.\{30\}started{34\}$server ##$/,/.\{30\}ended{34\}$server ##$/!d' file

sed '/.## TSM-INSTALL SESSION\{6\}started at\{31\}$server ##$/,/.## TSM-INSTALL SESSION\{6\}ended at\{31\}$server ##$/!d' file

sed '/## TSM-INSTALL SESSION [0-9]+ started at .* $server/,/## TSM-INSTALL SESSION [0-9]+ ended at .* $server/!d' file

sed '/## TSM-INSTALL SESSION \.\.\.\.\. started at \.\.\.\.\.\.\.\.\.\. \.\.\.\.\.\.\.\. for host $server ##/,/## TSM-INSTALL SESSION \.\.\.\.\. ended at \.\.\.\.\.\.\.\.\.\. \.\.\.\.\.\.\.\. for host $server ##/!d' file

Upvotes: 2

Views: 144

Answers (2)

kdubs
kdubs

Reputation: 1722

If you stick this in a file called file.sed

/^## TSM-INSTALL SESSION ([0-9][0-9]*) started at [0-9][0-9]*\/[0-9][0-9]\/[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] or host ([^)]*) ##/,/^## TSM-INSTALL SESSION ([0-9][0-9]*) ended at [0-9][0-9]*\/[0-9][0-9]\/[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] or host ([^)]*) ##/p

and then call it like

sed -n -f file.sed inputfile 

I think it will do what you want.

The -n makes sed not print, so only the lines matched by expression will get printed.

Upvotes: 0

Vercingatorix
Vercingatorix

Reputation: 1884

Why not:

$(sed "/^## TSM-INSTALL SESSION .* started .* $server ##/,/^## TSM-INSTALL SESSION .* ended .* $server ##/!d" file)

You don't need to get fancy with the regexps. All you care about is the leading TSM-INSTALL SESSION, the started or ended, and the hostname, so use .* to mean "whatever in-between".

Upvotes: 3

Related Questions