Steve
Steve

Reputation: 54572

perl multiline find and replace

I'm trying to use a simple perl one-liner on the following input:

@F7##########0/1
C4CTA6GCAAC56G67CTCA99C
+
b[[WZ56W]87X9HBB
@44FC6%%%%&&&&&&&1UP1
GTS4HY2IOMD3FCCA8DFLLLTG
+
]]^4YY23ZV\6`a8`^9^a

etc.

I would like my output to look like:

@F7##########0/1
C4CTA6GCAAC56G67CTCA99C
+F7##########0/1
b[[WZ56W]87X9HBB
@44FC6%%%%&&&&&&&1UP1
GTS4HY2IOMD3FCCA8DFLLLTG
+44FC6%%%%&&&&&&&1UP1
]]^4YY23ZV\6`a8`^9^a

etc.

I'd like to search for a line starting with @, and storing (grouping) the remainder of the line in $1. I then find the next occurrence of + and add $1 onto the end of that line.

I've tried perl -pi -e "s%^@(.*)$\1\n(.*)$\2\n(\+)$\3\n%$1\n$2\n\+$1%mg" file.txt but I can't seem to match anything after ^@(.*)$\1\n.

Surely there's a working one-liner out there to accomplish this. Awk or Sed or tr one-liners are welcome, but changes to file.txt must be made in-line as the file.txt is large and writing to another file is undesirable.

Upvotes: 1

Views: 689

Answers (4)

potong
potong

Reputation: 58558

This might work for you:

sed '/^@/h;/^+/{G;s/\n@//}' file
@F7##########0/1
C4CTA6GCAAC56G67CTCA99C
+F7##########0/1
b[[WZ56W]87X9HBB
@44FC6%%%%&&&&&&&1UP1
GTS4HY2IOMD3FCCA8DFLLLTG
+44FC6%%%%&&&&&&&1UP1
]]^4YY23ZV\6`a8`^9^a

Upvotes: 0

jaypal singh
jaypal singh

Reputation: 77175

Unfortunately awk does not offer in-line changes so it may be not what you need. But if you do then the following would work -

awk '/^@/{a=substr($0,2)}/^\+/{printf ("%s%s\n", $0,a);next}1' file > newfile

Update: I have made an attempt to do what you are looking for in sed which allows for in-file changes.

sed -i '/^@/{h};/^\+/{x;s/\(.\)\(.*\)/+\2/}' file

Explanation:

  • /^@/{h} : We look for line that starts with @ sign and when we find it, we put the entire line in hold space. Sed has two buffers, pattern space and hold space. Pattern space is where all the action takes place. hold space allows us to retain information temporarily so that we can do some action on it later on.
  • /^\+/{x;... : When we find a line that starts with a +, we do x action on it. What it means is, we pull information out of our hold space and put it back in pattern space. Once we have done that, we do a simple substitution.
  • ...s/\(.\)\(.*\)/+\2/ : What this means is we identify characters using grouping. Since our portion of text had @ in front of it, which you didnt want, we isolate that character using . which means any character. We also put everything else of that line in a second group. These groups needs to be escaped {so you see \( \) instead of just ()}. In the replacement section we put in a + and the second group. Remember the first group captured only had @ in it. We just want the second group so we reference it using \2 (backslash and the number of group you wish to reference).

Test for awk:

[jaypal:~/Temp] cat file
@F7##########0/1
C4CTA6GCAAC56G67CTCA99C
+
b[[WZ56W]87X9HBB
@44FC6%%%%&&&&&&&1UP1
GTS4HY2IOMD3FCCA8DFLLLTG
+
]]^4YY23ZV\6`a8`^9^a

[jaypal:~/Temp] awk '/^@/{a=substr($0,2)}/^\+/{printf ("%s%s\n", $0,a);next}1' file
@F7##########0/1
C4CTA6GCAAC56G67CTCA99C
+F7##########0/1
b[[WZ56W]87X9HBB
@44FC6%%%%&&&&&&&1UP1
GTS4HY2IOMD3FCCA8DFLLLTG
+44FC6%%%%&&&&&&&1UP1
]]^4YY23ZV\6`a8`^9^a

Test for sed:

You can use -i option for making changes in place. The following is just for demo so that you can see the output.

[jaypal:~/Temp] sed '/^@/{h};/^\+/{x;s/\(.\)\(.*\)/+\2/}' file
@F7##########0/1
C4CTA6GCAAC56G67CTCA99C
+F7##########0/1
b[[WZ56W]87X9HBB
@44FC6%%%%&&&&&&&1UP1
GTS4HY2IOMD3FCCA8DFLLLTG
+44FC6%%%%&&&&&&&1UP1
]]^4YY23ZV\6`a8`^9^a

Upvotes: 3

Borodin
Borodin

Reputation: 126762

My apologies. I read your question more carefully and see that you want to process your file line by line. This one-liner will achieve that

perl -pe "$dat = $1 if /^\@(.+)/; s/^\+/+$dat/;" infile

Upvotes: 2

Borodin
Borodin

Reputation: 126762

The program below appears to do what you need

use strict;
use warnings;

my $str = <<'STR';
@F7##########0/1
C4CTA6GCAAC56G67CTCA99C
+
b[[WZ56W]87X9HBB
@44FC6%%%%&&&&&&&1UP1
GTS4HY2IOMD3FCCA8DFLLLTG
+
]]^4YY23ZV\6`a8`^9^a
STR

$str =~ s/^@(.+?)$(.+?)^\+/\@$1$2+$1/gms;

print $str;

OUTPUT

@F7##########0/1
C4CTA6GCAAC56G67CTCA99C
+F7##########0/1
b[[WZ56W]87X9HBB
@44FC6%%%%&&&&&&&1UP1
GTS4HY2IOMD3FCCA8DFLLLTG
+44FC6%%%%&&&&&&&1UP1
]]^4YY23ZV\6`a8`^9^a

Upvotes: 0

Related Questions