Reputation: 339
PART 1:
So, I've got a file (inputfile) that looks like this:
inputfile
unimportant stuff ...
col1 col2 col3
26 ACE 0
27 ACE 0
28 ACE 0
...
32 CCY 1
33 CCY 1
34 CCY 1
...
42 NME 2
43 NME 2
44 NME 2
...
48 MMP 3
49 MMP 3
50 MMP 3
...
54 SCY 1
55 SCY 1
56 SCY 1
...
65 MMP 2
66 MMP 2
67 MMP 2
... etc
422 XXX 0
423 XXX 1
Desired output
outputfile1
col1 col2 col3
26 ACE 0
32 CCY 1
42 NME 2
48 MMP 3
54 SCY 1
65 MMP 2
Any ideas how to approach this using awk/sed/grep (some other program) that will produce the desired output? In words, what I'm trying to develop is a script that will start when col1 = 26 and only print when col3 changes, until the end of the file. Also, I want to remove anything with XXX in col2.
PART 2:
Following this, I would like to then produce a new file (outputfile2) that depends on col3 of outputfile1. Every time the count in col3 resets (or decreases to 0/1 and starts counting again) I want to print to outputfile2 something like:
outputfile2
26 - 53
ACE_CCY_NME_MMP
54 - ...
SCY_MMP_...
Ideally, it would print
line1: "col1 entry" - "col1 entry minus 1"
line2: "all col 2 entries inbetween col2_col2_col2_col2" etc
How would I best achieve these results?
PART 1 SOLVED:
awk '$1 == "26" {f=1}f {print $0}' inputfile | uniq -f 2 | sed '/XXX/d' > outputfile1
Which produces:
26 ACE 0
32 CCY 1
42 NME 2
48 MMP 3
54 SCY 1
64 MMP 2
...
Explanation: awk prints from first instance of finding '26' in col1 until end of file, this is then piped to uniq which deletes any lines that have repeat values in col3 (of adjacent lines), finally sed deletes any lines that contain the unwanted string 'XXX'. If anyone can explain the awk {f=1}f part in more detail would be appreciated?
Upvotes: 2
Views: 213