Chang Woon Jang
Chang Woon Jang

Reputation: 125

Sorting a file with python

I have a data file (trajectory file) which is not numerically sorted. The data file consists of texts and numbers repeatedly like the below. As you can see, the first 4 rows are just information, and the real numbers being sorted start with fifth row. Then again, another four rows are just information, then the number starts with the fifth row. Those are repeatedly hundred blocks. I would like to sort them numerically as the first column.

ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
ITEM: ATOMES id type x y z
4959 8 10.1 20.1 41.1
5029 8 13.1 43.1 5.3
....
ITEM: TIMESTEP
100
ITEM: NUMBER OF ATOMS
ITEM: ATOMES id type x y z
1259 8 10.1 20.1 41.1
6169 8 13.1 43.1 5.3
....
ITEM: TIMESTEP
200
ITEM: NUMBER OF ATOMS
ITEM: ATOMES id type x y z
3523 8 10.1 20.1 41.1
9119 8 13.1 43.1 5.3
....

I tried to make a python script. My idea is putting the each number block between 'ITEM: ATOMES id type x y z' and ITEM: NUMBER of ATOMS into list, then sort them in the list and print them. I have put them into list but the each element like (e.g., 4959 8 10.1 20.1 41.1) is just one string. How can I sort as the first column of the string in the list?

I tried as the following. Would you give me some advice?

f_in=open('aa', 'r')

def SORT(List):

        print 'ITEM: TIMESTEP'
        print 'Num of Trajectory'
        print 'ITEM: NUMBER OF ATOMS'
        print 'ATOMS'
        print 'ITEM: BOX BOUNDS pp pp pp'
        print '\n\n'
        print 'ITEM: ATOMS id type x y z'

        for p in List:
                print p

LIST=[]

a = 1

for line in f_in:

        sp = line.split()

        if(len(sp) != 5):
                continue
        else:
                if(a < 5085):
                        LIST.append(line)
                        a = a + 1
                elif(a == 5085):
                        LIST.append(line)
                        LIST = map(lambda s: s.strip(), LIST)
                        SORT(LIST)
                        a = 1

Upvotes: 1

Views: 165

Answers (3)

Quinn
Quinn

Reputation: 4504

You could also try:

import re
f_in=open('aa', 'r')

def SORT(List):

        print 'ITEM: TIMESTEP'
        print 'Num of Trajectory'
        print 'ITEM: NUMBER OF ATOMS'
        print 'ATOMS'
        print 'ITEM: BOX BOUNDS pp pp pp'
        print '\n\n'
        print 'ITEM: ATOMS id type x y z'

        for p in List:
                print p

result = [] # real numbers list

# read whole content into a list
lines= f_in.readlines()
# enumerate each line and find only the numers
# append each found item into result list
for line in lines:
    m = re.findall('^[0-9\s\.].+', line.strip('\n'))
    if m: result.append(m[0])
    else: continue
# split result list into chunks (5085)
for i in xrange(0, len(result), 5085):
    LIST = result[i:i+5085]
    SORT(LIST)

Upvotes: 0

Martin Evans
Martin Evans

Reputation: 46759

The following script will read in your file and sort the rows within each block:

from itertools import groupby

with open('input.txt') as f_input, open('output.txt', 'w') as f_output:
    for k, g in groupby(f_input, lambda x: x != 'ITEM: TIMESTEP\n'):
        if k:
            entries = [line.strip() for line in g]
            block_header = ['ITEM: TIMESTEP'] + entries[:3]
            entries = sorted([line.split() for line in entries[3:]], key=lambda x: int(x[0]))
            f_output.write('\n'.join(block_header) + '\n')

            for row in entries:
                f_output.write(' '.join(row) + '\n')

It makes use of Python's groupby function to read in the file in blocks based on ITEM: TIMESTEP. It then strips the new lines off each row, and extracts just the rows with values. It then splits each of these rows based on spaces and sorts these rows by converting the first entry to an integer.

It then writes each of these rows to the output file, giving each the same block header.

Upvotes: 1

Brian Schlenker
Brian Schlenker

Reputation: 5426

Once you have your list, you can sort is using sort's key parameter.

numberList.sort(key=lambda line: int(line.split()[0]))

This tells sort to use the first item in the line converted to an integer as the sort key.

However, this wouldn't work if any of your lines that start with text are within the list. The conversion to int would fail. You will have to filter those out first.

Upvotes: 0

Related Questions