Reputation: 355
I am working with a csv file which is an output from a Gas chromatograph data analyzer so I can only manipulate what is provided. I need to remove lines that are unnecessary from the csv file or keep only the necessary lines. There are 960 lines in the actual file.
The 1st 8 lines in the file look like this
[Line 1] Remove
[Line 2] Remove
[Line 3] Keep
[Line 4] Remove
[Line 5] Remove
[Line 6] Remove
[Line 7] Keep
[Line 8] Keep
The pattern of line ranges I want to keep/remove continues for hundreds of lines, so here is the next 8 lines as an example.
[Line 9] Remove
[Line 10] Remove
[Line 11] Keep
[Line 12] Remove
[Line 13] Remove
[Line 14] Remove
[Line 15] Keep
[Line 16] Keep
There are no string patterns that discern these lines only the line numbers themselves. I would like to avoid having to calculate the ranges from hundreds of lines and put them all in sed like the script shown below which only cuts the desired number of lines for the first 8 lines only.
sed '1,2d; 4,6d' test.csv >> cut_test.csv
I am hoping for the following:
[Line 3] Keep
[Line 7] Keep
[Line 8] Keep
[Line 11] Keep
[Line 15] Keep
[Line 16] Keep
Upvotes: 2
Views: 434
Reputation: 58558
This might work for you (GNU sed):
sed -n 'n;n;p;n;n;n;n;p;n;p' file
Does as it says on the tin.
Better (already mentioned by Thor):
sed -n '3~8p;7~8,+1p' file
Upvotes: 1
Reputation: 12456
If the line numbers to keep are following the exact pattern (repeating every 8 lines) that you have provided in your explanation, you can use the following GNU sed command:
$ sed '1~8d;2~8d;4~8d;5~8d;6~8d;' input.csv
[Line 3] Keep
[Line 7] Keep
[Line 8] Keep
[Line 11] Keep
[Line 15] Keep
[Line 16] Keep
and redirect it to a new file or user -i.back
to change the file in-place.
Explanation:
1~8d
will execute the d
command on the 1st line, 9th line,...2~8d
will execute the d
command on the 2nd line, 10th line,...input.csv:
$ cat input.csv
[Line 1] Remove
[Line 2] Remove
[Line 3] Keep
[Line 4] Remove
[Line 5] Remove
[Line 6] Remove
[Line 7] Keep
[Line 8] Keep
[Line 9] Remove
[Line 10] Remove
[Line 11] Keep
[Line 12] Remove
[Line 13] Remove
[Line 14] Remove
[Line 15] Keep
[Line 16] Keep
You can even simplify the command by regrouping everything in the following way (that is close to your command):
$ sed '1~8,2~8d;4~8,6~8d;' input.csv
[Line 3] Keep
[Line 7] Keep
[Line 8] Keep
[Line 11] Keep
[Line 15] Keep
[Line 16] Keep
As mentioned by Thor you can reduce the command if, instead of deleting the lines you want to remove, you just print the lines you want to keep:
$ sed -n '3~8p;7~8,8~8p;' input.csv
[Line 3] Keep
[Line 7] Keep
[Line 8] Keep
[Line 11] Keep
[Line 15] Keep
[Line 16] Keep
Upvotes: 5
Reputation: 143
The sed solution is elegant, but as you also tagged Python, here's an equivalent solution in that language. It should scale to enormous files if it ever becomes necessary, because it never reads the entire file at once (which I believe is true of the sed solution too):
import itertools
with open('input.csv', 'r') as in_file:
with open('output.csv', 'w') as out_file:
out_file.writelines(entry for entry, keep in zip(in_file.readlines(), itertools.cycle([False, False, True, False, False, False, True, True])) if keep)
Upvotes: 1
Reputation: 20032
Short answer:
Default action in awk
for a match is printing the line:
awk 'NR%8~/3|7|0/' input.csv
Long answer, inspired by the comments of @kvantour
awk 'NR%8~/3|7|0/' input.csv
# or shorter (when module < 10)
awk 'NR%8~/[037]/' input.csv
When you need modulo > 9, you need to match the complete line with the ^$
markers. With modulo 25 and lines 3,7,8,11,14,22 you can use
awk 'NR%25~/^[3|7|0|11|14|22]$/' input.csv
# or shorter
awk 'NR%25~/^[037]|1[14]|22$/' input.csv
This becomes harder to read for more values. An alternative is
# Original case
awk 'BEGIN {a[3];a[7];a[0]} NR%8 in a' input.csv
# 3,7,8,11,14,22
awk 'BEGIN {a[3];a[7];a[8];a[11];a[14];a[22];} NR%25 in a' input.csv
Pulling the numbers out:
# Original case
awk 'FNR==NR {a[$0];next} FNR%8 in a' <(printf "%s\n" 3 7 0) input.csv
# 3,7,8,11,14,22
awk 'FNR==NR {a[$0];next} FNR%25 in a' <(printf "%s\n" 3 7 8 11 14 22) input.csv
Upvotes: 1
Reputation: 40033
The Python approach is just
import sys
for i,l in enumerate(sys.stdin):
if i%8 in (2,6,7): print(l) # 0-based
Upvotes: 3