GalB1t
GalB1t

Reputation: 265

Sed regex string substitution from terminal

I have a log file with a standard format, e.g.:

31 Mar - Lorem Ipsom1
31 Mar - Lorem Ipsom2
31 Mar - Lorem Ipsom3

The replacement I want to implement is 31*31 to 31 so I'll end up with a log that has only its last line, in this example it will look like:

31 Mar - Lorem Ipsom3

I wish to perform it on a customized linux machine that has no perl. I tried to use sed like this:

sed -i -- 's/31*31/31/g' /var/log/prog/logFile

But it did nothing.. Any alternatives involving ninja bash commands are also welcomed.

Upvotes: 1

Views: 148

Answers (3)

Alan Dyke
Alan Dyke

Reputation: 914

I think you might be looking for "tail" to get the last line of the file e.g.

tail -1 /path/file

or if you want the last entry from each day then "sort" might be your solution

sort -ur -k 1,2 /path/file | sort
  • the -u flag specifies only a single match for the keyfields will be returned
  • the -k 1,2 specifies that the keyfields are the first two fields - in this case they are the month and the date - fields by default are separated by white space.
  • the -r flag reverses the lines such that the last match for each date will be returned. Sort a second time to restore the original order.

If your log file has more than a single month of data, and you wish to preserve order (e.g. if you have Mar 31 and Apr 1 in the same file) you can try:

cat -n tmp2 | sort -nr | sort -u -k 2,3 | sort -n | cut -f 2-
  • cat -n adds the line number to the log file before sorting.
  • sort as before but use fields 2 and 3, because field 1 is now the original line number
  • sort by the original line number to restore the original order.
  • use cut to remove the line numbers and restore the original line content.

e.g.

 $ cat tmp2
 30 Mar - Lorem Ipsom2
 30 Mar - Lorem Ipsom1
 31 Mar - Lorem Ipsom1
 31 Mar - Lorem Ipsom2
 31 Mar - Lorem Ipsom3
 1 Apr - Lorem Ipsom1
 1 Apr - Lorem Ipsom2

 $ cat -n tmp2 | sort -r | sort -u -k 2,3 | sort | cut -f 2-
 30 Mar - Lorem Ipsom1
 31 Mar - Lorem Ipsom3
 1 Apr - Lorem Ipsom2

Upvotes: 0

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477170

* is not a wildcard as it is in the shell, it is a quantifier. You need to quantify over . (any character). The regex is thus:

sed ':a;N;$!ba;s/31.*31/31/g'

(I removed the -i flag so you can first test your file safely).

The :a;N;$!ba; part makes it possible to process over new lines.

Note however:

  • The regex will match any 31 so:

    31 Mar - Lorem Ipsom1
    31 Mar - Lorem 31 Ipsom2
    

    Will result in

    31 Ipsom2
    
  • It will match greedy, if the log reads:

    31 Mar - Lorem Ipsom1
    30 Mar - Lorem Ipsom2
    31 Mar - Lorem Ipsom3
    

It remove the second line.

You can solve the first problem by writing:

sed ':a;N;$!ba;s/(^|\n)31.*\n31/31/g'

Which forces the regex that second 31 is located at the beginning of the line.

Upvotes: 2

Wintermute
Wintermute

Reputation: 44063

A way to keep only the last of consecutive lines that match a pattern is

sed -n '/^31/ { :a $!{ h; n; //ba; x; G } }; p' filename

This works as follows:

/^31/ {    # if a line begins with 31
  :a       # jump label for looping

  $!{      # if the end of input has not been reached (otherwise the current
           # line is the last line of the block by virtue of being the last
           # line)

    h      # hold the current line
    n      # fetch the next line. (note that this doesn't print the line
           # because of -n)

    //ba   # if that line also begins with 31, go to :a. // attempts the
           # most recently attempted regex again, which was ^31

    x      # swap hold buffer, pattern space
    G      # append hold buffer to pattern space. The PS now contains
           # the last line of the block followed by the first line that 
           # comes after it
  }
}
p          # in the end, print the result

This avoids some problems of mult-line regular expressions such as matches that begin or end in the middle of a line. It will also not discard lines between two blocks of matching lines and keep the last line of each block.

Upvotes: 4

Related Questions