Alex
Alex

Reputation: 3431

Sorting a list of strings based on the first element from split (datetime)

I have a long list of strings, separated by commas (basically, csv files read line by line to strings, not performing a split on the separator):

lines[0] = "2017-08-01 13:45:58,mytext,mytext2,mytext3,etc"
lines[1] = "2017-08-01 15:45:58,mytextx,mytext2x,mytext3x,etcx"
lines[2] = "2017-08-01 19:45:58,mytexty,mytext2y,mytext3y,etcy"
lines[3] = "..."

From this post I know that the following code should work if my lines would only consist of datetimes:

lines_sorted = sorted(lines, key=lambda x: datetime.datetime.strptime(lines, '%Y-%m-%d %H:%M:%S'))

I thought I could use partition to extract tuples from all lines in files, where the first element contains the datetimepart:

for unsortedFile in glob('*.txt'):
    with open(unsortedFile, 'r') as file:
        lines = [line.rstrip('\n').partition(',') for line in file]
        lines_sorted = sorted(lines, key=lambda x: datetime.datetime.strptime(lines[0], '%Y-%m-%d %H:%M:%S'))

..but of course, this does not work "TypeError: list indices must be integers or slices, not str" because lines[0] is not referencing the first tuple but the first item in lines-list. I also tried using .strptime(lines[lambda][0], '%Y-%m-%d %H:%M:%S')) but it is neither working.

I know I am doing something wrong.. any help is much appreciated.

[edit] Here's the answer, from friendly comments below:

for unsortedFile in glob('*.txt'):
    with open(unsortedFile, 'r', encoding="utf8") as file: #read each unsorted file to lines (list)
        lines = [line.rstrip('\n') for line in file]
        lines_sorted = sorted(lines,
                    key=lambda x: x.split(',', maxsplit=1)[0]
                    )
        lines.clear()
    with open(unsortedFile,'w', encoding="utf8") as file: #overwrite file
        for line in lines_sorted:
            file.write(line + '\n')

Upvotes: 0

Views: 643

Answers (2)

User9123
User9123

Reputation: 606

basically the key argument of the sorted function must be a function which takes a list item and returns a comparable object.
sorted will sort the list according to the image of the list items by this function, not the items themselves.

Here is an example, which is a mix of the suggested solutions :

lines_sorted = sorted(lines,
                      key=lambda x: x.split(',', maxsplit=1)[0]
                     )

With this code, every item which has the same date will be considered equal by sorted.

Upvotes: 1

Netwave
Netwave

Reputation: 42746

Just take the first element of the split:

lines_sorted = sorted(
    lines, 
    key=lambda x: datetime.datetime.strptime(x.split(",")[0], 
                                            '%Y-%m-%d %H:%M:%S'
))

This way you are just taking the datetime for the sorting while keeping the original data.

Upvotes: 2

Related Questions