Hanna
Hanna

Reputation: 41

SED: Deleting text between two strings, repeated across the line

The issue is that I wish to remove all text between to strings on a line using SED. I understand the use of: sed -i 's/str1.*str2//' file.dat to remove the text between str1 and str2, inclusive of str1 and str2, but my line has str1 and str2 repeated on the line many times, and I would like to remove the text between each pair. My attempt above removes all text between the first instance of str1 and the last instance of str2. Would appreciate some help in understanding the function to do this.

In addition I would like to repeat this across all lines in the file, and do not know how many times the str1, str2 pair appears on each line. It varies.

Kind Regards

Additional Edit - hope not into a flame-wall!

An example may be of use; Having trouble understanding the answers thus far sorry guys.

For a single line in a file example.dat;

bla.bla.TextOfUnknownLength.bla.bla 1023=3 290=1 336=17 273=07:59:57.833 276=K 278=0 bla.bla.TextOfUnknownLength.bla.bla 1023=20 290=2 336=7 273=07:59:57.833 276=K 278=0 bla.bla.TextOfUnknownLength.bla.bla ...

I wish to remove from 1023= to 278= inclusive (but not the 0 after 278=) in all instances, this text between 1023= and 278= can occur many times in a line and is of unknown length.

There are also many lines in the file, and I would like to run this across all lines.

HS

Upvotes: 4

Views: 8713

Answers (3)

Marc Bredt
Marc Bredt

Reputation: 955

sed -ri 's/(foo)(.*)(bar)/\1\3/g' between.file

explanation. use regular expressions -r to match the part before,between and after in your line. then just replace with the prefix \1 and the suffix \2 using sed's internal replacement variables with leading backslashes.

UPDATE: Consider between.file contains the following contents.

foo---1---bar
foo---2---bar
foo---3---bar

Then the command above removes the contents between foo and bar, so the output looks like

foobar
foobar
foobar

Wasn't that your desired output/change in your file?

UPDATE: I think awk fits better for your needs.

Assume the beween.file contains the following lines

A foo---1---bar B foo---10--bar C 
A foo---2---bar D foo---20--bar E 
A foo---3---bar B foo---30---bar C 

this script

#!/bin/bash
awk '{                            
                 all="";
                 for(i=0; i<=NF; i++) { 
                   if(!($i~/foo.*bar/)) { all=all" "$i; } 
                 };                            
                 print all;
               }' between.file

will produce the following output

 A B C
 A D E
 A B C

You could use this to implement some kind of DFA to switch into a specific state when reading 1023= and leaving this reading 278=.

Redirect the output to a new file or search the docuMANtation for awk to process directly on a file. hope this helps.

Upvotes: 2

potong
potong

Reputation: 58420

This might work for you (GNU sed):

sed -r ':a;s/([^\n]*)(foo)[^\n]+(bar)/\1\n\2\3/;ta;s/\n//g' file

Use greed, an unique delimiter and a loop to remove characters between foo and bar. The greed works backwards through the line and the delimiter stops the part of the line that has been processed from being processed again. The loop removes one or more occurances of foo through bar.

Upvotes: 0

NeronLeVelu
NeronLeVelu

Reputation: 10039

just add the g ath the end of your sed.

sed -i 's/str1.*str2//g' file.dat 
  • g mean: for each occurence on the current buffer, by default this is the current line.
  • sed work by default 1 line at a time, then at the end of the action, continue with the next one.

Remark with this:

  • if str1 and str2 are not on the same line, no change between those 2
  • str1 ans str2 are part of the pattern so some special character need to be escaped sometimes (like (,{,[,\,&,^,.,.. depending of wanted behaviour.

Upvotes: 0

Related Questions