Reputation: 24721

Delete regex group from matching line using grep or sed

I have a file with contents as this:

- 2 equal files of size 288903252
- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
  "H:\Softwares\Linux\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 277436598
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976
  "C:\Windows\System32\DriverStore\FileRepository\nvacwu.inf_amd64_9934c34dc6ca0c4b\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvamwu.inf_amd64_d4715679184092a8\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvaowu.inf_amd64_785608ed2524cdea\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvblwu.inf_amd64_31f54e2d1ba058d5\NvCplSetupInt.exe"

I want to delete those lines with - X equal files of size without having actual file paths following them. For example first and third bullet point:

- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
  "H:\Softwares\Linux\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976
  "C:\Windows\System32\DriverStore\FileRepository\nvacwu.inf_amd64_9934c34dc6ca0c4b\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvamwu.inf_amd64_d4715679184092a8\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvaowu.inf_amd64_785608ed2524cdea\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvblwu.inf_amd64_31f54e2d1ba058d5\NvCplSetupInt.exe"

I formed a regex that matches these lines:

(^-.*\n)-

which can be checked in action at above link. I want to delete that first group which is essentially the whole line. But not able to guess how do I do the same with grep or sed. Can we do this in single command?

Upvotes: 0

Answers (4)

123

Reputation: 11216

Using sed

sed '/^-/{N;/\n-/D}' file

- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
  "H:\Softwares\Linux\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976
  "C:\Windows\System32\DriverStore\FileRepository\nvacwu.inf_amd64_9934c34dc6ca0c4b\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvamwu.inf_amd64_d4715679184092a8\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvaowu.inf_amd64_785608ed2524cdea\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvblwu.inf_amd64_31f54e2d1ba058d5\NvCplSetupInt.exe"

Portable version for any version of sed

sed -e '/^-/{N' -e '/\
-/D' -e '}' file

If you want to remove the last line if it is -

sed -e '/^-/{$d' -e 'N' -e '/\
-/D' -e '}' file

Upvotes: 2

Ed Morton

Reputation: 203189

sed is for simple substitutions on individual lines, that is all. For anything else you should be using awk. If you are using sed constructs other than s, g, and p (with -n) then you are using constructs that became obsolete in the mid-1970s when awk was invented.

This will work robustly, efficiently, and portably with any awk on any UNIX box:

$ awk '/^ /{print p $0; p=""; next} {p=$0 ORS}' file
- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
  "H:\Softwares\Linux\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976
  "C:\Windows\System32\DriverStore\FileRepository\nvacwu.inf_amd64_9934c34dc6ca0c4b\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvamwu.inf_amd64_d4715679184092a8\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvaowu.inf_amd64_785608ed2524cdea\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvblwu.inf_amd64_31f54e2d1ba058d5\NvCplSetupInt.exe"

Upvotes: 0

Hugues M.

Reputation: 20467

Is ~~pepsi~~ perl okay?

cat input.txt | perl -pe 'BEGIN{undef $/;} s/^-.*?\n-/-/smg'

The BEGIN block allows the multiline search by essentially telling perl that there is no end of line character. Then the s/ part will substitute any part matching your regex with a - (no need for a capturing group).

Oh, and I slightly modified your regex to be greedy, with a ?. Otherwise, the search being multiline, it would match from the first - to the last one, and remove almost everything.

~~Edit: here is a lengthy and informative Q/A about multiline search, that shows it will be difficult with sed.~~

Edit2: actually quite easy with a modern sed, see @123's answer

Upvotes: 0

zwer

Reputation: 25769

You can just grep it:

grep -v -B1 "^-" test_file.txt | grep -v "\-\-"

- 2 equal files of size 284164096
  "C:\E\100p disk util bak\Softwares\OSs\gparted-live-0.26.1-1-i686.iso"
  "H:\Softwares\Linux\gparted-live-0.26.1-1-i686.iso"
- 2 equal files of size 161356649
  "H:\Softwares\Dev Tools\Eclipse\Windows\eclipse-java-luna-SR1a-win32-x86_64.zip"
- 35 equal files of size 97078976
  "C:\Windows\System32\DriverStore\FileRepository\nvacwu.inf_amd64_9934c34dc6ca0c4b\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvamwu.inf_amd64_d4715679184092a8\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvaowu.inf_amd64_785608ed2524cdea\NvCplSetupInt.exe"
  "C:\Windows\System32\DriverStore\FileRepository\nvblwu.inf_amd64_31f54e2d1ba058d5\NvCplSetupInt.exe"

How it works? It's merely selecting all lines and the lines before them that don't start with a -. The second grep just removes the group separator, some grep versions support --no-group-separator so you can do it in one go.

Upvotes: 1

Delete regex group from matching line using grep or sed

Answers (4)

Related Questions