Reputation: 9
I am working with a data set:
ALI P 18:00:40.583 0.0
ALI S 18:00:58.188 1.4
BRD Pg 18:00:48.918 0.4
BRD Sg 18:01:09.437 -1.8
GAN Pn 18:00:58.207 -0.0
GAN Sn 18:01:27.791 0.1
GLB P 18:00:27.265 -0.4
GLB S 18:00:34.187 0.1
GOB S 18:01:13.638 -0.6
IML Pg 18:00:52.264 -0.6
Using AWK and I need lines that match, to be printed onto the same line.
i.e.
ALI P 18:00:40.583 0.0 ALI S 18:00:58.188 1.4
BRD Pg 18:00:48.918 0.4 BRD Sg 18:01:09.437 -1.8
I've been trying all sorts of different ideas but cannot locate code to do this. I have been trying to use AWK, as instructed by my superior. Would be interested to see if it would be easier in Python?
(note white space between lines to preserve structure)
Upvotes: 0
Views: 107
Reputation: 46779
This could be approached in Python as follows:
from itertools import groupby
data = """ALI P 18:00:40.583 0.0
ALI S 18:00:58.188 1.4
BRD Pg 18:00:48.918 0.4
BRD Sg 18:01:09.437 -1.8
GAN Pn 18:00:58.207 -0.0
GAN Sn 18:01:27.791 0.1
GLB P 18:00:27.265 -0.4
GLB S 18:00:34.187 0.1
GOB S 18:01:13.638 -0.6
IML Pg 18:00:52.264 -0.6"""
print '\n'.join(' '.join(g) for k,g in groupby(data.splitlines(), key=lambda x: x.split()[0]))
This would display:
ALI P 18:00:40.583 0.0 ALI S 18:00:58.188 1.4
BRD Pg 18:00:48.918 0.4 BRD Sg 18:01:09.437 -1.8
GAN Pn 18:00:58.207 -0.0 GAN Sn 18:01:27.791 0.1
GLB P 18:00:27.265 -0.4 GLB S 18:00:34.187 0.1
GOB S 18:01:13.638 -0.6
IML Pg 18:00:52.264 -0.6
Upvotes: 1
Reputation: 67507
another awk
$ awk '{a[$1]=a[$1]?a[$1] FS $0:$0}
END{for(k in a) print a[k] | "sort" }' file | column -t
ALI P 18:00:40.583 0.0 ALI S 18:00:58.188 1.4
BRD Pg 18:00:48.918 0.4 BRD Sg 18:01:09.437 -1.8
GAN Pn 18:00:58.207 -0.0 GAN Sn 18:01:27.791 0.1
GLB P 18:00:27.265 -0.4 GLB S 18:00:34.187 0.1
GOB S 18:01:13.638 -0.6
IML Pg 18:00:52.264 -0.6
accumulate records with the same key, print at the end and sort (by the key), column
for prettying. Doesn't require the keys to be contiguous or sorted.
Upvotes: 1
Reputation: 113924
As I understand it, you are matching on the first field and the file is sorted. In that case, try:
$ awk 'NR>1{printf "%s%s",($1==last?" ":"\n"),$0}; NR==1{printf "%s",$0} {last=$1} END{print""}' file
ALI P 18:00:40.583 0.0 ALI S 18:00:58.188 1.4
BRD Pg 18:00:48.918 0.4 BRD Sg 18:01:09.437 -1.8
GAN Pn 18:00:58.207 -0.0 GAN Sn 18:01:27.791 0.1
GLB P 18:00:27.265 -0.4 GLB S 18:00:34.187 0.1
GOB S 18:01:13.638 -0.6
IML Pg 18:00:52.264 -0.6
NR==1{printf "%s",$0}
For the first line, we print it with no trailing newline.
NR>1{printf "%s%s",($1==last?" ":"\n"),$0}
For lines after the first, we print a space if the first fields match or a newline if they don't, followed by the line.
The tricky-looking part here is the ternary statement $1==last?" ":"\n"
. This just tests to see if the first field is equal to the last first field. If it is, it returns the string after the ?
. If it isn't, it returns the string after the :
.
last=$1
We update the variable last
to the most recent first field.
END{print""}
After we have finished reading the file and to make sure that we have a complete final line, we print a newline.
Upvotes: 1