Reputation: 35
NEED: I have a file containing data like the sample below. I need to:
In this case: Merge all lines into one where fields $1 through $6 and $8 match, create a range of the associated values in field $7
OR
Merge all lines into one where fields $1 through $7 match, create a range of the associated values in field $8*
@rtx13's solution below using TCL does work (thanks again), I'm just not sure if I can install TCL in my live environment so I hope and AWK/BASH/etc. solution can also be proposed.
Original data:
HOST FILTER INTERFACE SOURCE DESTINATION PROTOCOL SOURCE PORT DESTINATION PORT
host1 input nic1 ip1 ip2 PROT 30000 10
host1 input nic1 ip1 ip2 PROT 50000 10
host1 input nic1 ip1 ip2 PROT 60000 10
host1 input nic1 ip3 ip2 PROT 10 30000
host1 input nic1 ip3 ip2 PROT 10 50000
host1 input nic1 ip3 ip2 PROT 10 60000
host1 output nic1 ip2 ip1 PROT 10 30000
host1 output nic1 ip2 ip1 PROT 10 50000
host1 output nic1 ip2 ip1 PROT 10 60000
host1 output nic1 ip2 ip3 PROT 30000 10
host1 output nic1 ip2 ip3 PROT 60000 10
host1 output loc ip2 ip2 PROT 10 30000
host1 output loc ip2 ip2 PROT 10 50000
Desired Processed Output:
host1 input nic1 ip1 ip2 PROT 30000:60000 10
host1 input nic1 ip3 ip2 PROT 10 30000:60000
host1 output nic1 ip2 ip1 PROT 10 30000:60000
host1 output nic1 ip2 ip3 PROT 30000:60000 10
host1 output loc ip2 ip2 PROT 10 30000:50000
Do I need a separate file that lists the ports I'd like to merge on? Since these could also fall inside a range as well? So the script could reference it to know whether it's a merge port or not?
10|EXAMPLE
22|SSH
80|HTTP
2049|NFS
*etc*
I'm trying to be as specific as I can be since my original post was deemed to need more focus; it received an answer (Thanks you @Enrico Maria De Angelis) that was close, but not quite all the way there as well as one that used datamash (Thank you @oguz ismail), which worked great but sadly I need a solution not requiring installation of additonal software: Merge all lines that are identical aside from a key field and make key field a range
Upvotes: 2
Views: 87
Reputation: 2610
The following works on both columns. Provide the list of ports to merge on via command line.
mergecolumn2
script:
#!/usr/bin/tclsh
# read data in and store into 'output' array and 'row_key' list
set rows [lrange [split [read -nonewline stdin] "\n"] 1 end]
foreach row $rows {
if { [lsearch -exact $argv [lindex $row 6]] >= 0 } {
set merge_column 7
} elseif { [lsearch -exact $argv [lindex $row 7]] >= 0 } {
set merge_column 6
} else {
set merge_column -1 ;# do not merge
}
if { $merge_column >= 0 } {
set value [lindex $row $merge_column]
set row [lreplace $row $merge_column $merge_column "%s"]
} else {
set value ""
set row [list {*}$row] ;# trim spaces for consistency
}
if { ![info exist output($row)] } {
lappend row_key $row
}
lappend output($row) $value
}
# iterate over 'row_key' and generate output
foreach row $row_key {
if { [llength $output($row)] > 1 } {
set o [lreplace [lsort -dictionary $output($row)] 1 end-1]
} else {
set o $output($row)
}
puts [format $row [join $o :]]
}
Terminal session:
$ ./mergecolumn2 < data
host1 input nic1 ip1 ip2 PROT 30000 10
host1 input nic1 ip1 ip2 PROT 50000 10
host1 input nic1 ip1 ip2 PROT 60000 10
host1 input nic1 ip3 ip2 PROT 10 30000
host1 input nic1 ip3 ip2 PROT 10 50000
host1 input nic1 ip3 ip2 PROT 10 60000
host1 output nic1 ip2 ip1 PROT 10 30000
host1 output nic1 ip2 ip1 PROT 10 50000
host1 output nic1 ip2 ip1 PROT 10 60000
host1 output nic1 ip2 ip3 PROT 30000 10
host1 output nic1 ip2 ip3 PROT 60000 10
host1 output loc ip2 ip2 PROT 10 30000
host1 output loc ip2 ip2 PROT 10 50000
$ ./mergecolumn2 10 22 80 2049 < data
host1 input nic1 ip1 ip2 PROT 30000:60000 10
host1 input nic1 ip3 ip2 PROT 10 30000:60000
host1 output nic1 ip2 ip1 PROT 10 30000:60000
host1 output nic1 ip2 ip3 PROT 30000:60000 10
host1 output loc ip2 ip2 PROT 10 30000:50000
$
Upvotes: 1
Reputation: 2610
The following works on one column at a time.
mergecolumn
script:
#!/usr/bin/tclsh
# check for column number
if { ![llength $argv] || ![regexp {^\d+$} [lindex $argv 0] merge_column] } {
puts stderr "usage: $argv0 column (0-based)"
exit 1
}
# read data in and store into 'output' array and 'row_key' list
set rows [lrange [split [read -nonewline stdin] "\n"] 1 end]
foreach row $rows {
set value [lindex $row $merge_column]
set row [lreplace $row $merge_column $merge_column "%s"]
if { ![info exist output($row)] } {
lappend row_key $row
}
lappend output($row) $value
}
# iterate over 'row_key' and generate output
foreach row $row_key {
if { [llength $output($row)] > 1 } {
set o [lreplace [lsort -dictionary $output($row)] 1 end-1]
} else {
set o $output($row)
}
puts [format $row [join $o :]]
}
Terminal session:
$ cat data
HOST FILTER INTERFACE SOURCE DESTINATION PROTOCOL SOURCE PORT DESTINATION PORT
host1 input nic1 ip1 ip2 PROT 30000 10
host1 input nic1 ip1 ip2 PROT 50000 10
host1 input nic1 ip1 ip2 PROT 60000 10
host1 input nic1 ip3 ip2 PROT 10 30000
host1 input nic1 ip3 ip2 PROT 10 50000
host1 input nic1 ip3 ip2 PROT 10 60000
host1 output nic1 ip2 ip1 PROT 10 30000
host1 output nic1 ip2 ip1 PROT 10 50000
host1 output nic1 ip2 ip1 PROT 10 60000
host1 output nic1 ip2 ip3 PROT 30000 10
host1 output nic1 ip2 ip3 PROT 60000 10
host1 output loc ip2 ip2 PROT 10 30000
host1 output loc ip2 ip2 PROT 10 50000
$ ./mergecolumn 6 < data
host1 input nic1 ip1 ip2 PROT 30000:60000 10
host1 input nic1 ip3 ip2 PROT 10 30000
host1 input nic1 ip3 ip2 PROT 10 50000
host1 input nic1 ip3 ip2 PROT 10 60000
host1 output nic1 ip2 ip1 PROT 10 30000
host1 output nic1 ip2 ip1 PROT 10 50000
host1 output nic1 ip2 ip1 PROT 10 60000
host1 output nic1 ip2 ip3 PROT 30000:60000 10
host1 output loc ip2 ip2 PROT 10 30000
host1 output loc ip2 ip2 PROT 10 50000
$ ./mergecolumn 7 < data
host1 input nic1 ip1 ip2 PROT 30000 10
host1 input nic1 ip1 ip2 PROT 50000 10
host1 input nic1 ip1 ip2 PROT 60000 10
host1 input nic1 ip3 ip2 PROT 10 30000:60000
host1 output nic1 ip2 ip1 PROT 10 30000:60000
host1 output nic1 ip2 ip3 PROT 30000 10
host1 output nic1 ip2 ip3 PROT 60000 10
host1 output loc ip2 ip2 PROT 10 30000:50000
$
Upvotes: 1