figos
figos

Reputation: 21

Trimming a file with regular expressions / sed

I've got a file with several lines like this:

*wordX*-Sentence1.;Sentence2.;Sentence3.;Sentence4.

One of these Sentences may or may not contain wordX. What I want is to trim the file to make it look like this:

*wordX*-Sentence1.;Sentence2.

Where Sentence3 was the first to contain wordX.

How can i do this with sed/awk?

Edit:

Here's a sample file:

*WordA*-This sentence does not contain what i want.%Neither does this one.;Not here either.;Not here.;Here is WordA.;But not here.
*WordB*-WordA here.;WordB here, time to delete everything.;Including this sentece.
*WordC*-WordA, WordB. %Sample sentence one.;Sample Sentence 2.;Sample sentence 3.;Sample sentence 4.;WordC.;Discard this.

And here is the desired output:

*WordA*-This sentence does not contain what i want.%Neither does this one.;Not here either.;Not here.
*WordB*-WordA here.
*WordC*-WordA, WordB. %Sample sentence one.;Sample Sentence 2.;Sample sentence 3.;Sample sentence 4.

Upvotes: 2

Views: 102

Answers (3)

anubhava
anubhava

Reputation: 785058

This task is more suited to awk. Use following awk command:

awk -F ";" '/^ *\*.*?\*/ {printf("%s;%s\n", $1, $2)}' inFile

This assumes that the words your are trying to match are always wrapped in asterisks *.

Upvotes: 1

jthill
jthill

Reputation: 60255

sed -r -e 's/\.;/\n/g' \
       -e 's/-/\n/' \
       -e 's/^(\*([^*]*).*\n)[^\n]*\2.*/\1/' \
       -e 's/\n/-/' \
       -e 's/\n/.;/g' \
       -e 's/;$//'

(edit: added the -:\n swaps to handle a match in the first sentence.)

Upvotes: 0

potong
potong

Reputation: 58381

This might work for you (GNU sed):

sed -r 's/-/;/;:a;s/^(\*([^*]+)\*.*);[^;]+\2.*/\1;/;ta;s/;/-/;s/;$//' file

Convert the - following the wordX to a ;. Delete sentences containing wordX ( working from the back to the front of the line). Replace the original -.Delete the last ;.

Upvotes: 0

Related Questions