Reputation: 21
I've got a file with several lines like this:
*wordX*-Sentence1.;Sentence2.;Sentence3.;Sentence4.
One of these Sentences may or may not contain wordX. What I want is to trim the file to make it look like this:
*wordX*-Sentence1.;Sentence2.
Where Sentence3 was the first to contain wordX.
How can i do this with sed/awk?
Edit:
Here's a sample file:
*WordA*-This sentence does not contain what i want.%Neither does this one.;Not here either.;Not here.;Here is WordA.;But not here.
*WordB*-WordA here.;WordB here, time to delete everything.;Including this sentece.
*WordC*-WordA, WordB. %Sample sentence one.;Sample Sentence 2.;Sample sentence 3.;Sample sentence 4.;WordC.;Discard this.
And here is the desired output:
*WordA*-This sentence does not contain what i want.%Neither does this one.;Not here either.;Not here.
*WordB*-WordA here.
*WordC*-WordA, WordB. %Sample sentence one.;Sample Sentence 2.;Sample sentence 3.;Sample sentence 4.
Upvotes: 2
Views: 102
Reputation: 785058
This task is more suited to awk. Use following awk command:
awk -F ";" '/^ *\*.*?\*/ {printf("%s;%s\n", $1, $2)}' inFile
This assumes that the words your are trying to match are always wrapped in asterisks *
.
Upvotes: 1
Reputation: 60255
sed -r -e 's/\.;/\n/g' \
-e 's/-/\n/' \
-e 's/^(\*([^*]*).*\n)[^\n]*\2.*/\1/' \
-e 's/\n/-/' \
-e 's/\n/.;/g' \
-e 's/;$//'
(edit: added the -
:\n
swaps to handle a match in the first sentence.)
Upvotes: 0
Reputation: 58381
This might work for you (GNU sed):
sed -r 's/-/;/;:a;s/^(\*([^*]+)\*.*);[^;]+\2.*/\1;/;ta;s/;/-/;s/;$//' file
Convert the -
following the wordX
to a ;
. Delete sentences containing wordX
( working from the back to the front of the line). Replace the original -
.Delete the last ;
.
Upvotes: 0