Daisy Guitron
Daisy Guitron

Reputation: 43

sort dataframe with dates as column headers in pandas

My dates have to be in water years and I wanted to find a way where I have the column start with date 09/30/1899_24:00 and end with date 9/30/1999_24:00.

enter image description here

Initially I had it like this (picture below) but when I did the dataframe pivot it messed up the order. enter image description here

Here is a snip of my code

    sim = pd.read_csv(headout,parse_dates=True, index_col='date')
    sim['Layer'] = sim.groupby('date').cumcount() + 1
    sim['Layer'] = 'L' + sim['Layer'].astype(str)
    sim = sim.pivot(index = None , columns = 'Layer').T
    sim = sim.reset_index() 
    sim = sim.rename(columns={"level_0": "NodeID"})
    sim["NodeID"]= sim['NodeID'].astype('int64')
    sim['gse'] = sim['NodeID'].map(sta.set_index(['NodeID'])['GSE'])

Upvotes: 0

Views: 1335

Answers (1)

Trenton McKinney
Trenton McKinney

Reputation: 62383

The issue is that 24:00 is not a valid time

  • If you don't convert the date column to valid datetime then python will treat the column as a string.
    • This will make it very difficult to perform any type of time based analysis
    • The order of the columns will then be ordered numerically as follows: '09/30/1899_24:00', '10/31/1899_24:00', '11/30/1898_24:00', '11/30/1899_24:00'
    • Note, 11/30/1898 is before 11/30/1899
  • Replace 24:00 with 23:59
import pandas as pd

# dataframe
df = pd.DataFrame({'date': ['09/30/1899_24:00', '09/30/1899_24:00', '09/30/1899_24:00', '09/30/1899_24:00', '10/31/1899_24:00',
                            '10/31/1899_24:00', '10/31/1899_24:00', '10/31/1899_24:00', '11/30/1899_24:00', '11/30/1899_24:00']})

|    | date             |
|---:|:-----------------|
|  0 | 09/30/1899_24:00 |
|  1 | 09/30/1899_24:00 |
|  2 | 09/30/1899_24:00 |
|  3 | 09/30/1899_24:00 |
|  4 | 10/31/1899_24:00 |
|  5 | 10/31/1899_24:00 |
|  6 | 10/31/1899_24:00 |
|  7 | 10/31/1899_24:00 |
|  8 | 11/30/1899_24:00 |
|  9 | 11/30/1899_24:00 |

# replace 24:00
df.date = df.date.str.replace('24:00', '23:59')

# formate as datetime
df.date = pd.to_datetime(df.date, format='%m/%d/%Y_%H:%M')


# final
                 date
0 1899-09-30 23:59:00
1 1899-09-30 23:59:00
2 1899-09-30 23:59:00
3 1899-09-30 23:59:00
4 1899-10-31 23:59:00
5 1899-10-31 23:59:00
6 1899-10-31 23:59:00
7 1899-10-31 23:59:00
8 1899-11-30 23:59:00
9 1899-11-30 23:59:00

Remove all time component

df.date = df.date.str.replace('_24:00', '')
df.date = pd.to_datetime(df.date, format='%m/%d/%Y')

        date
0 1899-09-30
1 1899-09-30
2 1899-09-30
3 1899-09-30
4 1899-10-31
5 1899-10-31
6 1899-10-31
7 1899-10-31
8 1899-11-30
9 1899-11-30

Upvotes: 1

Related Questions