noconceptoflunch
noconceptoflunch

Reputation: 9

Rearrange text file based on matches in columns

I am working with a data set:

ALI P 18:00:40.583 0.0

ALI S 18:00:58.188 1.4

BRD Pg 18:00:48.918 0.4

BRD Sg 18:01:09.437 -1.8

GAN Pn 18:00:58.207 -0.0

GAN Sn 18:01:27.791 0.1

GLB P 18:00:27.265 -0.4

GLB S 18:00:34.187 0.1

GOB S 18:01:13.638 -0.6

IML Pg 18:00:52.264 -0.6

Using AWK and I need lines that match, to be printed onto the same line.

i.e.

ALI P 18:00:40.583 0.0 ALI S 18:00:58.188 1.4

BRD Pg 18:00:48.918 0.4 BRD Sg 18:01:09.437 -1.8

I've been trying all sorts of different ideas but cannot locate code to do this. I have been trying to use AWK, as instructed by my superior. Would be interested to see if it would be easier in Python?

(note white space between lines to preserve structure)

Upvotes: 0

Views: 107

Answers (3)

Martin Evans
Martin Evans

Reputation: 46779

This could be approached in Python as follows:

from itertools import groupby

data = """ALI P 18:00:40.583 0.0
ALI S 18:00:58.188 1.4
BRD Pg 18:00:48.918 0.4
BRD Sg 18:01:09.437 -1.8
GAN Pn 18:00:58.207 -0.0
GAN Sn 18:01:27.791 0.1
GLB P 18:00:27.265 -0.4
GLB S 18:00:34.187 0.1
GOB S 18:01:13.638 -0.6
IML Pg 18:00:52.264 -0.6"""    

print '\n'.join(' '.join(g) for k,g in groupby(data.splitlines(), key=lambda x: x.split()[0]))

This would display:

ALI P 18:00:40.583 0.0 ALI S 18:00:58.188 1.4
BRD Pg 18:00:48.918 0.4 BRD Sg 18:01:09.437 -1.8
GAN Pn 18:00:58.207 -0.0 GAN Sn 18:01:27.791 0.1
GLB P 18:00:27.265 -0.4 GLB S 18:00:34.187 0.1
GOB S 18:01:13.638 -0.6
IML Pg 18:00:52.264 -0.6    

Upvotes: 1

karakfa
karakfa

Reputation: 67507

another awk

$ awk '{a[$1]=a[$1]?a[$1] FS $0:$0} 
    END{for(k in a) print a[k] | "sort" }' file | column -t

ALI  P   18:00:40.583  0.0   ALI  S   18:00:58.188  1.4
BRD  Pg  18:00:48.918  0.4   BRD  Sg  18:01:09.437  -1.8
GAN  Pn  18:00:58.207  -0.0  GAN  Sn  18:01:27.791  0.1
GLB  P   18:00:27.265  -0.4  GLB  S   18:00:34.187  0.1
GOB  S   18:01:13.638  -0.6
IML  Pg  18:00:52.264  -0.6

accumulate records with the same key, print at the end and sort (by the key), column for prettying. Doesn't require the keys to be contiguous or sorted.

Upvotes: 1

John1024
John1024

Reputation: 113924

As I understand it, you are matching on the first field and the file is sorted. In that case, try:

$ awk 'NR>1{printf "%s%s",($1==last?" ":"\n"),$0}; NR==1{printf "%s",$0} {last=$1} END{print""}' file
ALI P 18:00:40.583 0.0 ALI S 18:00:58.188 1.4
BRD Pg 18:00:48.918 0.4 BRD Sg 18:01:09.437 -1.8
GAN Pn 18:00:58.207 -0.0 GAN Sn 18:01:27.791 0.1
GLB P 18:00:27.265 -0.4 GLB S 18:00:34.187 0.1
GOB S 18:01:13.638 -0.6
IML Pg 18:00:52.264 -0.6

How it works

  • NR==1{printf "%s",$0}

    For the first line, we print it with no trailing newline.

  • NR>1{printf "%s%s",($1==last?" ":"\n"),$0}

    For lines after the first, we print a space if the first fields match or a newline if they don't, followed by the line.

    The tricky-looking part here is the ternary statement $1==last?" ":"\n". This just tests to see if the first field is equal to the last first field. If it is, it returns the string after the ?. If it isn't, it returns the string after the :.

  • last=$1

    We update the variable last to the most recent first field.

  • END{print""}

    After we have finished reading the file and to make sure that we have a complete final line, we print a newline.

Upvotes: 1

Related Questions