Lucielle
Lucielle

Reputation: 53

Using sed to add line above a set of lines

EDIT BELOW

I'm new to bash scripting, sorry if this has been answered elsewhere, couldn't find it in any searches I've done.

I'm using sed -i to add a line above an argument, for example.

for EFP in *.inp; do
    sed -i "/^O */i FRAGNAME=H2ODFT" $EFP
done

and it works as expected. but I would like it to only add the line when the argument is true across multiple lines, like so:

O
C
O
C
FRAGNAME=H2ODFT
O
H
H
FRAGNAME=H2ODFT
O
H
H

Notice there's no added line above the two O's that are followed by C's.

I tried the following:

for FILE in *.inp; do
    sed -i "/^O*\nH*\nH */i FRAGNAME=H2ODFT" $EFP
done

and I was expecting it to show up above the 3 lines that went O - H - H, but nothing happened, it passed through the file thinking that that argument was nowhere in the document.

I've looked elsewhere and thought of using awk, but I can't wrap my head around it.

Any help would be greatly appreciated! L

EDIT

Thanks for the help. And sorry for being a bit unclear. I've tried a ton of things, too many to put in this post. I've tried awk, perl and sed solutions, but they're not working.

My input has a series of O's C's and H's, with cartesian coordinates assigned to them:

     C           36.116           34.950           34.657
     C           35.638           34.681           35.883
     C           36.134           33.569           36.703
     C           34.379           34.567           37.522
     N           34.579           35.375           36.476
     N           35.234           33.518           37.706
     O           37.045           32.745           36.559
     H           36.892           34.226           34.415
     O           35.234           38.803           30.513
     H           34.303           39.079           30.567
     C           33.490           35.015           38.608
     H           34.002           35.390           39.503
     H           32.894           34.170           38.974
     H           32.832           35.813           38.245
     C           35.342           32.708           38.920
     H           35.920           33.237           39.688
     H           35.942           31.802           38.772
     H           34.356           32.475           39.340
     O           30.226           35.908           36.744
     H           30.557           36.408           37.490
     H           30.642           36.311           35.982
     O           37.356           40.420           29.232
     H           36.473           40.786           29.286
     H           37.220           39.474           29.189
     O           40.889           37.054           35.401
     H           40.304           36.361           35.706
     H           41.620           36.587           34.995

I'm trying to input a new line above a specific set of three lines, the OHH lines.

The awk solution posted didn't work, because it would add extra lines where there shouldn't be when the stage gets reset. I'm looking for the following output:

 C           36.116           34.950           34.657
 C           35.638           34.681           35.883
 C           36.134           33.569           36.703
 C           34.379           34.567           37.522
 N           34.579           35.375           36.476
 N           35.234           33.518           37.706
 O           37.045           32.745           36.559
 H           36.892           34.226           34.415
 O           35.234           38.803           30.513
 H           34.303           39.079           30.567
 C           33.490           35.015           38.608
 H           34.002           35.390           39.503
 H           32.894           34.170           38.974
 H           32.832           35.813           38.245
 C           35.342           32.708           38.920
 H           35.920           33.237           39.688
 H           35.942           31.802           38.772
 H           34.356           32.475           39.340
 FRAGNAME=H2ODFT
 O           30.226           35.908           36.744
 H           30.557           36.408           37.490
 H           30.642           36.311           35.982
 FRAGNAME=H2ODFT
 O           37.356           40.420           29.232
 H           36.473           40.786           29.286
 H           37.220           39.474           29.189
 FRAGNAME=H2ODFT
 O           40.889           37.054           35.401
 H           40.304           36.361           35.706
 H           41.620           36.587           34.995

The ^tsed was a typo and should've been an indent instead of ^t

Upvotes: 2

Views: 159

Answers (4)

dawg
dawg

Reputation: 103814

Here is a ruby to do that:

ruby -e 'lines=$<.read.split(/\R/)
lines.each_with_index{|line,i| 
    three_line_tag=lines[i..i+2].map{|sl| sl.split[0] }.join
    puts "FRAGNAME=H2ODFT" if three_line_tag == "OHH"
    puts line
}
' file 

Or any awk, same kind of method:

awk '{lines[NR]=$0}
END{
    for(i=1;i<=NR;i++) {
        tag=""
        for(j=0;j<=2;j++) {
            split(lines[i+j],arr)
            tag=tag arr[1]
        }
        if (tag=="OHH")
                print "FRAGNAME=H2ODFT"
        print lines[i]
    }
}
' file 

Or Perl:

perl -0777 -pe 's/(^\h*O\h.*\R^\h*H\h.*\R^\h*H\h.*\R?)/FRAGNAME=H2ODFT\n\1/gm' file

Any print:

    C           36.116           34.950           34.657
    C           35.638           34.681           35.883
    C           36.134           33.569           36.703
    C           34.379           34.567           37.522
    N           34.579           35.375           36.476
    N           35.234           33.518           37.706
    O           37.045           32.745           36.559
    H           36.892           34.226           34.415
    O           35.234           38.803           30.513
    H           34.303           39.079           30.567
    C           33.490           35.015           38.608
    H           34.002           35.390           39.503
    H           32.894           34.170           38.974
    H           32.832           35.813           38.245
    C           35.342           32.708           38.920
    H           35.920           33.237           39.688
    H           35.942           31.802           38.772
    H           34.356           32.475           39.340
FRAGNAME=H2ODFT
    O           30.226           35.908           36.744
    H           30.557           36.408           37.490
    H           30.642           36.311           35.982
FRAGNAME=H2ODFT
    O           37.356           40.420           29.232
    H           36.473           40.786           29.286
    H           37.220           39.474           29.189
FRAGNAME=H2ODFT
    O           40.889           37.054           35.401
    H           40.304           36.361           35.706
    H           41.620           36.587           34.995

===

Edit in place:

Read THIS about awk and that is generally applicable.

Any of these scripts as written write to stdout.

You can redirect the output to a new file:

someutility input_file >new_file

Or some (like perl, ruby, GNU awk, GNU sed) have the ability to do in-place file replacement. If you don't have that option, you cannot do:

someutil 'prints to STDOUT' file >file

since file will be destroyed before fully read.

Instead you would do:

someutil 'prints to STDOUT' file > tmp && mv tmp file

Upvotes: 1

Stephen Quan
Stephen Quan

Reputation: 25936

I know you requested a sed solution, but, I have a solution based on awk.

  • We initialize the awk program with a stage which, overtime, will track the progress of "OHH"
  • If we receive another letter, we grow the stage until we get OHH, then, we print your required string and reset the stage
  • If we encounter a breakage, we print out whatever we accumulated in stage and reset stage
awk '
BEGIN { stage="" }
/^O$/ { if (stage=="") { stage="O\n"; next } }
/^H$/ { if (stage=="O\n") { stage="O\nH\n"; next } }
/^H$/ { if (stage=="O\nH\n") { print "FRAGNAME=H20DFT\nO\nH\nH"; stage=""; next } }
{ print stage $1; stage="" }
' < sample.txt

Where sample.txt contains:

O
C
O
C
O
H
H
O
H
H

Upvotes: 0

potong
potong

Reputation: 58391

This might work for you (GNU sed):

sed -Ei -e ':a;N;s/\n/&/2;Ta;/^O(\n.)\1$/i FRAGNAME=H2ODFT' -e 'P;D' file1 file2 

Open a 3 line window throughout the file and if the required pattern matches, insert the line of the desired text.

N.B. The \1 back reference matches the line before. Also the script is in two separate pieces because the i command requires to end in a newline which the -e option provides.

An alternative version of the same solution:

cat <<\! | sed -Ef - -i file{1..100}
:a
N
s/\n/&/2
Ta
/^O(\n.)\1$/i FRAGNAME=H2ODFT
P 
D
!

Upvotes: 1

Sundeep
Sundeep

Reputation: 23667

If input files aren't large to cause memory issues, you can slurp the entire file and then perform the substitution. For example:

perl -0777 -pe 's/^O\nH\nH\n/FRAGNAME=H2ODFT\n$&/gm' ip.txt

If this works for you, then you can add the -i option for inplace editing. The regex ^O*\nH*\nH * shown in the question isn't clear. ^O\nH\nH\n will match three lines having O, H and H exactly. Adjust as needed.

Upvotes: 0

Related Questions