Reputation: 985
I have a file containing 20736 lines. Each 81 lines represent coordinates of atoms of a molecule. So I have total coordinates for 256 molecules. Now I want to select coordinates for specific part of every single molecule. For example within the 81 lines I want to only select line 44 until 81 from each molecule until all 256 molecules.
To explain much detail, I want to select lines
44-81 from 1-81 lines
126-163 from 82-163 lines
208-245 from 164-245 lines
290-327 from 246-327 lines
and so on until 20736 lines
To achieve this, I have tried with bash script like below:
#!/bin/bash
while read line
do
echo "$line"
done < malto-thermo-RT.set30.traj.pdbL1
But I am not sure how to proceed with implementing a loop to select only lines 44 until 81 lines from every subsequent 81 lines of the file.
Appreciate I get some help.
I also wish to get solution in python,awk, and perl if can for learning purpose.
Many thanks in advance.
Upvotes: 1
Views: 195
Reputation: 70822
Edited due to SO question's error.
Using modulos are surely the best way. The original idea in this SO question was added by @rici!
Unfortunely, the SO question is wrong: ...from 82-163 lines (included), than ...from 164-245 lines, I count 82 lines, not 81.
In first, I just would like to offer my bash +
sed alternative solution.
But now corrected, to better match the SO question, this could help to show where's the bug:
sed -nf <(for ((i=0;i<20736;i+=82));do echo $((i+44)),$(($i+81))p;done ) < file
Where bash generate sed commands and sed do the job.
Splitted explanation
The bash
portion:
for ((i=0;i<20736;i+=82)) ;do
echo $((i+44)),$(($i+81))p
done
do
44,81p
126,163p
208,245p
290,327p
...
20544,20581p
20626,20663p
20708,20745p
( Nota: This match exactly the SO question sample, but don't end at 20736!!
echo $((20746000/82)) 253000
if it represent molecules, there is only 252 full molecules, in 20736 lines. )
So the sed
script could by written:
sed -ne '44,81p;126,163p;208,245p;290,327p;...;20626,20663p;20708,20745p' <file
Upvotes: 1
Reputation: 241771
m % n
(in many programming languages) is the "modulo" operator: the remainder which is left after all the largest possible integer multiple of n
is removed from m
.
The lines you want to print are those lines for which the line number modulo 81 is at least 43. (This works out better if the first line is counted as line 0; making that adjustment means you want lines numbered 43-80; 124-161; 205-242 etc. (I think the OP has a small arithmetic error, but it might be an explanation error. The sequence here is based on the stanzas being 81 lines, as the OP says, rather than 82 lines as the example seems to indicate).
So, in awk:
awk '(NR-1)%81 >= 43'
That's based on awk's default action, which is {print}
, so I didn't have to supply one.
Edit: If the example ranges provided in the OP are correct (which they would be if there were a blank line separating the 81-line stanzas, for example, then this could be changed to:
awk 'NR%82>43'
Upvotes: 3
Reputation: 70822
Simple perl using @rici's idea of modulos:
perl -ne 'print if $.%82>43' file
Upvotes: 1
Reputation: 8107
Your problem statement is fine but you haven't tried hard. Check how a combination of head
and tail
commands & how to pass parameters to your script can help you achieve what you want.
http://www.ss64.com/bash/head.html
http://www.ss64.com/bash/tail.html
For example,
$ cat file
line1
line2
line3
line4
line5
line6
line7
line8
line9
line10
In this example, we can print lines from 3 to 7 using:
$ head -7 file | tail -5
line3
line4
line5
line6
line7
Upvotes: -1
Reputation: 85795
rici has the right idea by using the modulus operator but as the records increase his solution progressively becomes out of sync as demonstrated by the following:
$ seq 350 | awk '(NR-1)%81==43{printf "%i",$0} (NR-1)%81==80{print " -",$0}'
44 - 81 # In sync
125 - 162 # Out of sync by 1
206 - 243 # Out of sync by 2
287 - 324 # Out of sync by 3
To print the lines you requested you would do:
$ awk 'NR%82>43' file
The printed ranges are:
$ seq 350 | awk 'NR%82==44{printf "%i",$0} NR%82==81{print " -",$0}'
44 - 81
126 - 163
208 - 245
290 - 327
Test yourself with:
$ seq 350 | awk 'NR%82>43'
Upvotes: 1
Reputation: 41456
Using awk
, you can do some like this
awk '
{
if (NR<=t)
{
for (l=t-37;l<=t;l++)
printf "%s ",$l
}
if (NR==t)
{
t+=82
}
} ' t=81 file
Upvotes: -1
Reputation: 50647
perl -ne '
BEGIN{ ($f,$t)=(44,81) }
($.==$f .. $.==$t) =~ /(E0|.)$/ or next;
print;
$1 eq "E0" or next;
$_ += 82 for $f,$t;
' file
Upvotes: 1
Reputation: 7092
Here's my naive, non-idiomatic crack it it using bash:
#!/bin/bash
file=/tmp/file
segment_size=81
select_offset=44
select_size=37
start_line=$select_offset
end_line=$(($start_line + $select_size))
i=0
while read line
do
i=$(($i + 1))
if [ $i -ge $start_line ]; then
[ $i -eq $start_line ] && [ $i != 1 ] && echo -e "\n-------------------\n"
if [ $i -le $end_line ]; then
echo "$line"
if [ $i -eq $end_line ]; then
start_line=$(($start_line + $segment_size + 1))
end_line=$(($start_line + $select_size))
fi
fi
fi
done < $file
Bash is certainly not my forte :\ :\ Seems to work tho!
Upvotes: 1