J H
J H

Reputation: 11

If I change a Pandas Periodindex into a String valued index, what am I giving up?

A disclaimer at the start: I am not really a programmer but do have some ability to hack some things together and get a reasonable result. But my knowledge is very much a case of— get a solution to the problem at hand, find the tree but not really take a look at the forest.

I'm trying to put together a database for playing with ML and AI models in python. The data has heterogeneous frequencies:

Given that some of the work I need to do is related to trying to forecast estimates for things like quarterly earnings or other financial data elements from balance sheets or income statements, and macro data like GNP, that means classifying daily, weekly or monthly data into a quarterly bucket is needed.

I was starting to use the Pandas Periodindex function and that did what I wanted. But if I want to do something like filter a dataframe for only Q1 data for all the years turns out it is not as simple as "where 'Q1' in df.index, do something". Looks like I can do what I want by jumping though some additional steps, but I'm thinking of simply making the index a string value rather than a Period value (just using the .astype(str) when using the Periodindex() call).

Would I be giving up some functionality I'm not seeing at the moment but might really like having access to by changing the index?

Below is a snippet of what I'm talking about:

...
qBalSh.index = pd.PeriodIndex(qBalSh.fiscalDateEnding, freq='Q').astype(str)
qBalSh.index.name = 'Quarter'
qIncSt.index = pd.PeriodIndex(qIncSt.fiscalDateEnding, freq='Q')
qIncSt.index.name = 'Quarter'
...

Upvotes: 0

Views: 46

Answers (1)

wjandrea
wjandrea

Reputation: 33145

PeriodIndex has time-specific features. See the user guide: Time series / date functionality

In any case, you don't need to set the index as string, since:

A) PeriodIndex provides a .quarter attribute

qIncSt[qIncSt.index.quarter == 1]

B) Even if it didn't, you can always do .astype(str) on-demand and use that for filtering.

qIncSt[qIncSt.index.astype(str).str.contains('Q1')]

Example

months = ["2023-01", "2023-02", "2023-03", "2023-04", "2023-05"]
qIncSt = pd.DataFrame({
    'fiscalDateEnding': pd.to_datetime(months),
    'foobar': range(len(months))})
qIncSt.index = pd.PeriodIndex(
    qIncSt['fiscalDateEnding'],
    freq='Q',
    name='Quarter')
>>> qIncSt
        fiscalDateEnding  foobar
Quarter                         
2023Q1        2023-01-01       0
2023Q1        2023-02-01       1
2023Q1        2023-03-01       2
2023Q2        2023-04-01       3
2023Q2        2023-05-01       4

>>> qIncSt[qIncSt.index.quarter == 1]
        fiscalDateEnding  foobar
Quarter                         
2023Q1        2023-01-01       0
2023Q1        2023-02-01       1
2023Q1        2023-03-01       2

>>> qIncSt[qIncSt.index.astype(str).str.contains('Q1')]
        fiscalDateEnding  foobar
Quarter                         
2023Q1        2023-01-01       0
2023Q1        2023-02-01       1
2023Q1        2023-03-01       2

Upvotes: 0

Related Questions