Patrick
Patrick

Reputation: 339

Conditionally print line only when column entry doesn't match previous line's column

PART 1:

So, I've got a file (inputfile) that looks like this:

inputfile

unimportant stuff ...
col1    col2     col3
26      ACE      0  
27      ACE      0  
28      ACE      0  
...  
32      CCY      1  
33      CCY      1  
34      CCY      1  
...  
42      NME      2  
43      NME      2  
44      NME      2  
...   
48      MMP      3  
49      MMP      3  
50      MMP      3  
...  
54      SCY      1  
55      SCY      1  
56      SCY      1  
...
65      MMP      2  
66      MMP      2  
67      MMP      2
... etc
422     XXX      0
423     XXX      1

Desired output

outputfile1

col1    col2     col3
26      ACE      0  
32      CCY      1  
42      NME      2  
48      MMP      3  
54      SCY      1  
65      MMP      2

Any ideas how to approach this using awk/sed/grep (some other program) that will produce the desired output? In words, what I'm trying to develop is a script that will start when col1 = 26 and only print when col3 changes, until the end of the file. Also, I want to remove anything with XXX in col2.

PART 2:

Following this, I would like to then produce a new file (outputfile2) that depends on col3 of outputfile1. Every time the count in col3 resets (or decreases to 0/1 and starts counting again) I want to print to outputfile2 something like:

outputfile2

26 - 53
ACE_CCY_NME_MMP
54 - ...
SCY_MMP_...

Ideally, it would print

line1: "col1 entry" - "col1 entry minus 1"

line2: "all col 2 entries inbetween col2_col2_col2_col2" etc

How would I best achieve these results?


PART 1 SOLVED:

awk '$1 == "26" {f=1}f {print $0}' inputfile |  uniq -f 2 | sed '/XXX/d' > outputfile1

Which produces:

26      ACE      0
32      CCY      1
42      NME      2
48      MMP      3
54      SCY      1
64      MMP      2
...

Explanation: awk prints from first instance of finding '26' in col1 until end of file, this is then piped to uniq which deletes any lines that have repeat values in col3 (of adjacent lines), finally sed deletes any lines that contain the unwanted string 'XXX'. If anyone can explain the awk {f=1}f part in more detail would be appreciated?

Upvotes: 2

Views: 213

Answers (1)

perreal
perreal

Reputation: 98078

This is for the first output:

 uniq -f 2 input > outputfile1

Upvotes: 4

Related Questions