Nayden Van
Nayden Van

Reputation: 1569

Sort rows by timestamp

I know this is supposed to be really easy, but for some reason it doesn't work. I have a csv file as follow:

message,name,userID,period,@timestamp,event_count
"Successful Logon for user "" user""",Logon Attempt,user,period_1,2021-05-11 09:52:30,1
"Successful Logon for user "" user""",Logon Attempt,user,period_1,2021-05-10 06:04:24,1

I am trying to sort the rows based on the timestamp.

First think I have done was to convert the @timestamp to a date format as follow and sort the value

f['@timestamp'] = pd.to_datetime(f['@timestamp'], format="%Y-%m-%d %H:%M:%S").sort_values()

But when I run the script, the result is still not sorted in any order based on the time stamp

any advice on what am I doing wrong? sorry I am still new at pandas

EDIT:

even using :

f['@timestamp'] = pd.to_datetime(f['@timestamp'], format="%Y-%m-%d %H:%M:%S")
f = f.sort_values(by='@timestamp')

The output is always the same. values not sorted.

Using the updated script, is I run the script, the output is still as follow:

message,name,userID,period,@timestamp,event_count
"Successful Logon for user "" user""",Logon Attempt,user,period_1,2021-05-11 09:52:30,1
"Successful Logon for user "" user""",Logon Attempt,user,period_1,2021-05-10 06:04:24,1

Upvotes: 0

Views: 137

Answers (1)

mozway
mozway

Reputation: 261860

It does not matter if you sort or shuffle the output of pd.to_datetime(…), it will be reordered once assigned to you column to match the index.

First assign to the column, then sort:

df['timestamp'] = pd.to_datetime(…)
df = df.sort_values(by='timestamp')

example

Let's start with a non sorted dataframe

>>> df = pd.DataFrame({'date': ['2021-07-29', '2000-01-01', '2020-02-01']})
>>> df
         date
0  2021-07-29
1  2000-01-01
2  2020-02-01

Apply datetime and sort:

>>> df['date'] = pd.to_datetime(df['date'])
>>> df = df.sort_values(by='date')
        date
1 2000-01-01
2 2020-02-01
0 2021-07-29

It works fine with your dataset:

df['@timestamp'] = pd.to_datetime(df['@timestamp'])
df = df.sort_values(by='@timestamp')
df
                             message           name userID    period          @timestamp  event_count
1  Successful Logon for user " user"  Logon Attempt   user  period_1 2021-05-10 06:04:24            1
0  Successful Logon for user " user"  Logon Attempt   user  period_1 2021-05-11 09:52:30            1

Upvotes: 1

Related Questions