Reputation: 21
My testdata
aa1
bb1
cc1
aa2
bb2
cc2
aa3
bb3
cc3
aa4
bb4
cc4
aa5
bb5
cc5
aa6
bb6
cc6
aa7
bb7
cc7
aa8
bb8
cc8
Let say I wish to extract line 4-6 (aa2-cc2) into a file then skip for 6 lines and extract line 13-15 (aa5-cc5) followed by the same skipping of 6 lines. The process will repeat until the end of the file. I have written a bash script which works just fine for small files.
#!/bin/bash
for i in {2..8..3}; do
sed -n "$((3*i-2))","$((3*i))"p testdata > "$i".part
done
Now that I am dealing with a giant file of 30 GB, my script is bad for the harddisk as it will be reading the same file for thousands of times. I wish to avoid HDD damage by reading (and extract my part) the file only once. Is there a one-liner that can solves my problem?
I am not really a programmer so please bear with any terminologies mix-up in my question. Thank you for your help!
Upvotes: 2
Views: 70
Reputation: 47189
You could do the loop inside sed, e.g. with GNU sed:
# Skip first 3 lines, extract 3 lines and skip 6
sed -n '4~9 { N; N; p }'
Example use:
seq 40 | sed -n '4~9 { N; N; p }'
Output:
4
5
6
13
14
15
22
23
24
31
32
33
Note that this solution only prints whole text blocks. If there are not enough lines in the final block, it will not be printed, i.e. 40,41,42
in the example above.
4~9
tells sed to, from line 4, only execute the code-block every 9 lines{ N; N; p }
so for every 9 lines we fetch 2 more lines (N; N
) then print them all p
Upvotes: 3
Reputation: 16997
IIUC, you want to extract lines and write to some file, if so if you can create one more file to extract list of records with range then you may try below one,
Say you got file named extract
with range of your interest
$ cat extract
4-6
13-15
This is your input file
$ cat file
aa1
bb1
cc1
aa2
bb2
cc2
aa3
bb3
cc3
aa4
bb4
cc4
aa5
bb5
cc5
aa6
bb6
cc6
aa7
bb7
cc7
aa8
bb8
cc8
If you execute like below:
$ awk -F'[- ]' 'FNR==NR{rules[FNR,"min"]=$1;rules[FNR,"max"]=$2;m=FNR;next}function is_in_list(i){for(i=1; i <=m; i++)if(FNR>=rules[i,"min"] && FNR<=rules[i,"max"])return rules[i,"min"]"-"rules[i,"max"]".txt"}{file=is_in_list()}file{ if(file in arr){ print >>file }else{ print >file; arr[file] } close(file) }' extract file
You get:
$ ls *.txt
13-15.txt 4-6.txt
Contents of each file are as follows:
$ cat 4-6.txt
aa2
bb2
cc2
$ cat 13-15.txt
aa5
bb5
cc5
In case if you just want to list lines then
$ awk -F'[- ]' 'FNR==NR{rules[FNR,"min"]=$1;rules[FNR,"max"]=$2;m=FNR;next}function is_in_list(i){for(i=1; i <=m; i++)if(FNR>=rules[i,"min"] && FNR<=rules[i,"max"])return rules[i,"min"]"-"rules[i,"max"]".txt"}is_in_list()' extract file
aa2
bb2
cc2
aa5
bb5
cc5
Better Readable of write to individual file:
awk -F'[- ]' '
FNR==NR{
rules[FNR,"min"]=$1;
rules[FNR,"max"]=$2;
m=FNR;
next
}
function is_in_list(i)
{
for(i=1; i <=m; i++)
if(FNR>=rules[i,"min"] && FNR<=rules[i,"max"])
return rules[i,"min"]"-"rules[i,"max"]".txt"
}
{
file=is_in_list()
}
file{
if(file in arr){
print >>file
}
else{
print >file;
arr[file]
}
close(file)
}
' extract file
Better Readable of listing lines, for a given range
awk -F'[- ]' '
FNR==NR{
rules[FNR,"min"]=$1;
rules[FNR,"max"]=$2;
m=FNR;
next
}
function is_in_list(i)
{
for(i=1; i <=m; i++)
if(FNR>=rules[i,"min"] && FNR<=rules[i,"max"])
return rules[i,"min"]"-"rules[i,"max"]".txt"
}
is_in_list()
' extract file
Upvotes: 1
Reputation: 18697
In GNU sed
it's possible to use the first~step
line addressing:
sed -n '4~9p; 5~9p; 6~9p' file
Upvotes: 3
Reputation: 247042
A single pass through the file is all that's required. Plus a little arithmetic.
awk '{n = NR % 9} 4 <= n && n <= 6' file
Upvotes: 3