Ulf Tietze
Ulf Tietze

Reputation: 57

sed remove string until next occurence

imagine, that i've some chatlog protocol. It could look like this:

MSG sender|reciever2: Hello its meCRLF
MSG bob|anna: Hello annaCRLF
MSG bob|anna: How are youCRLF
MSG anna|bob: Im fine, you?CRLF
MSG bob|anna: Same, wanna hang out some time?CRLF
MSG anna|bob: YesCRLF
MSG bob|peter: hey im asking anna to hang out lolCRLF
MSG anna|bob: for sureCRLF
MSG anna|bob: maybe in a few weeks?CRLF

I only want to get the chat between Anna and Bob, but only want to have the senders name one time, just until the other chatpartner begins.

What i've already archived is this sed script.

s/^MSG\s+(anna|bob)\|(anna|bob)\:\s{1}(.+)CRLF$/\1: "\3"/g
t end

/^.*/d

:end

This creates:

bob: "Hello anna"
bob: "How are you"
anna: "Im fine, you?"
bob: "Same, wanna hang out some time?"
anna: "Yes"
anna: "for sure"
anna: "maybe in a few weeks?"

But i want something similar to:

bob: 
  Hello anna
  How are you
anna
  Im fine, you?
bob: 
  Same, wanna hang out some time?
anna: 
  Yes
  for sure
  maybe in a few weeks?

So, how can delete after one bob, all the bobs until the next anna comes? Note, this is some stuff i have to use sed for. This has to run on Ubuntu Linux Systems with sed (GNU sed) 4.7 Packaged by Debian

Upvotes: 2

Views: 115

Answers (3)

user14473238
user14473238

Reputation:

This uses POSIX sed syntax.

sed '
/^MSG \(anna\)|bob:/!{
  /^MSG \(bob\)|anna:/!d
}
s//\1:\
 /;s/CRLF$//;t t
:t
H;x;s/^\([^:]*:\n\).*\1//;t
g' file

It appends the current record to the previous one in the hold space, swaps them, removes duplicate names (along with the previous record), or else reverts the pattern space back to the original current record.

Here's a more efficient version:

sed '
t
/^MSG \(anna\)|bob:/!{
  /^MSG \(bob\)|anna:/!d
}
s//\1:\
 /;s/CRLF$//
H;s/:.*/:/
x;s/^\([^:]*:\n\)\1//p;D' file

This avoids the use of .* in the duplicate detecting regexp by using the hold space to store the previous name rather than the entire previous record.

Upvotes: 1

potong
potong

Reputation: 58478

This might work for you (GNU sed):

sed -E '/^MSG ((anna)\|bob|(bob)\|anna): (.*)CRLF/{s//\2\3:\4/;H};$!d
       x;s/(\n.*:).*(\1.*)*/\1\n&/mg;s/\n+.*:(\S)/\n  \1/mg;s/.//' file

Turn on extended regexp -E.

Gather up the anna and bob conversations in the hold space.

At the end of file swap to the hold space, prepend the name of the of the following lines of conversation, remove the unwanted names and space indent each line of conversation for the prepended name.

Finally remove the first newline artefact.


An alternative solution (similar to KamilCuk):

sed -E '/^MSG ((anna)\|bob|(bob)\|anna): (.*)CRLF/!d;s//\2\3:\4/;G
        /^([^:]*:)(.*)\n\1$/{s//  \2/p;d};h;s/:.*/:/p;x;s/[^:]*:/  /;P;d' file

Upvotes: 1

KamilCuk
KamilCuk

Reputation: 141493

The following script:

cat <<EOF |
MSG sender|reciever2: Hello its meCRLF
MSG bob|anna: Hello annaCRLF
MSG bob|anna: How are youCRLF
MSG anna|bob: Im fine, you?CRLF
MSG bob|anna: Same, wanna hang out some time?CRLF
MSG anna|bob: YesCRLF
MSG bob|peter: hey im asking anna to hang out lolCRLF
MSG anna|bob: for sureCRLF
MSG anna|bob: maybe in a few weeks?CRLF
EOF
sed '
  # preprocess - remove uninterested parts
  /MSG \(\(anna\)|bob\|\(bob\)|anna\): \(.*\)CRLF/!d
  s//\2\3:\4/

  # Check if are doing it again with same name.
  G   # Grab the previous name from hold space.
  /^\([^:]*\):\(.*\)\n\1$/{   # The names match?
    s//  \2/p                 # Print only the message.
    d
  }

  h    # Put the whole line into hold space. For later.
  s/^\([^:]*\):\([^\n]*\).*/\1/   # Extract only name from the line.
  x    # Put the name in hold space, and grab the full line from hold space.
  s//\1:\n  \2/     # Print the name with the message.
'

outputs:

bob:
  Hello anna
  How are you
anna:
  Im fine, you?
bob:
  Same, wanna hang out some time?
anna:
  Yes
  for sure
  maybe in a few weeks?

Upvotes: 3

Related Questions