Reputation: 359

Sort A list of Strings Based on certain field

Overview: I have data something like this (each row is a string):

81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M 3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M 61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M

And I want to sort each row based on the first timestamp that is present in the each String, which for these four records is:

2016-07-14 01:28:59

2016-07-14 06:25:32

2016-07-14 08:26:45

2016-07-14 14:29:13

Now I know the sort() method but I don't understand how can I use here to sort all the rows based on this (timestamp) quantity, and I do need to keep the final sorted data in the same format as some other service is going to use it.

I also understand I can make the key() but I am not clear how that can be made to sort on the timestamp field.

Upvotes: 5

Answers (3)

AliceLanniste

Reputation: 46

you can use string.split()，string.split(',')[1]

Upvotes: 1

Dilettant

Reputation: 3335

If the format of the line in itself shall not be changed, maybe (I do not know the wider context of the solution) a simple shell transformation is fitting well (I know it is not a python solution).

So:

$ sort -t, -k2,2 sort_me_on_first_timestamp_field.txt 
3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M 
61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M
B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M 
81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M

Looks quite OK to me. the -t option tells sort to use the comma as the delimiter, the -k2,2 requests sorting based on the second "field" (it starts counting at one). sometimes it is important to switch with -n to numerical sorting, but here with ISO datetime string of fixed length it should work with lexical sorting.

Again: If you are looking for a pure python solution, I suggest picking the suggested python based answer. This here only suggests a baseline alternative.

Update to "measure" some scenario on some machine - well:

On the "machine of the developer", sorting the sample 4 lines concatenated multiple times into files of 20, 200, 2000, ..., 2,000,000 lines take from 12 milli seconds to 1.7 seconds (for 2 million lines) to sort with the sort command writing to /dev/null and 2 seconds writing to a file.

A naive implementation of @juanpa.arrivillaga's proposed route sorting in-place:

#! /usr/bin/env python
FILE_PATH_IN = './fhf.txt'
NL, FS = '\n', ','

list_of_strings = open(FILE_PATH_IN).read().split(NL)[:-1]
list_of_strings.sort(key=lambda s: s.split(FS)[1])
with open(FILE_PATH_IN + ".out", "wt") as f:
    f.write(NL.join(list_of_strings))

on the same machine takes approx. 3 seconds for the 2 million line case as the other variant (using sorted to generate a new list) does:

#! /usr/bin/env python
FILE_PATH_IN = './fhf.txt'
NL, FS = '\n', ','

list_of_strings = open(FILE_PATH_IN).read().split(NL)[:-1]
with open(FILE_PATH_IN + ".out", "wt") as f:
    f.write(NL.join(sorted(list_of_strings, key=lambda s: s.split(',')[1])))

So suggested is, to use the pure python solution.

Upvotes: 2

juanpa.arrivillaga

Reputation: 95948

You can use the list method list.sort which sorts in-place or use the sorted() built-in function which returns a new list. the key argument takes a function which it applies to each element of the sequence before sorting. You can use a combination of string.split(',') and indexing to the second element, e.g. some_list[1], so:

In [8]: list_of_strings
Out[8]: 
['81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M',
 '3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M',
 'B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M',
 '61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M']

In [9]: sorted(list_of_strings, key=lambda s: s.split(',')[1])
Out[9]: 
['3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M',
 '61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M',
 'B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M',
 '81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M']

Or if you'd rather sort a list in place,

list_of_strings
Out[12]: 
['81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M',
 '3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M',
 'B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M',
 '61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M']

list_of_strings.sort(key=lambda s: s.split(',')[1])

list_of_strings
Out[14]: 
['3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M',
 '61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M',
 'B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M',
 '81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M']

Upvotes: 10

Sort A list of Strings Based on certain field

Answers (3)

Related Questions