Reputation: 129
ANSWER:
I found a way to answer my own question. Assuming I am looking for the location of one given day only (then extrapolate for my specific question):
group_by = df.groupby(level='lvl_1')
ans = group_by.nth(df.index.get_level_values('lvl_2').unique().get_loc(day_2, method='nearest'))
Ideally, I would work with the location of each groupid, considering that the datetime vector could be different. However, I am having a hard time to figure out the last step...:
group_by = df.groupby(level='lvl_1')
loc = group_by.apply(lambda x: x.index.get_level_values('lvl_2').unique().get_loc(day_2, method='nearest'))
ans = group_by.nth(loc.groupby(level='lvl_1'))
But it gives me an error for my last line:
TypeError: n needs to be an int or a list/set/tuple of ints
If someone finds a way to solve this slight issue, fire up! thxs
QUESTION
I have been looking around for an answer but most of the posts are related to difference in days, but not value difference between two dates.
Assuming the following code :
import pandas as pd
import numpy as np
import datetime
np.random.seed(15)
day = datetime.date.today()
day_1 = datetime.date.today() - datetime.timedelta(1)
day_2 = datetime.date.today() - datetime.timedelta(2)
day_3 = datetime.date.today() - datetime.timedelta(3)
ticker_date = [('fi', day), ('fi', day_1), ('fi', day_2), ('fi', day_3),
('di', day), ('di', day_1), ('di', day_2), ('di', day_3)]
index_df = pd.MultiIndex.from_tuples(ticker_date, names=['lvl_1', 'lvl_2'])
df = pd.DataFrame(np.random.rand(8), index_df, ['value'])
output:
value
lvl_1 lvl_2
fi 2018-02-15 0.848818
2018-02-14 0.178896
2018-02-13 0.054363
2018-02-12 0.361538
di 2018-02-15 0.275401
2018-02-14 0.530000
2018-02-13 0.305919
2018-02-12 0.304474
I am looking for a method to groupby 'lvl_1' then get the difference between two given dates.
For instance, the difference between February 14th and February 12th would be -0.1864 for 'fi' and 0.225526 for 'di'.
I was working on the following lines of codes:
group_by = df.groupby(level='lvl_1')
nd = group_by.get_loc(day_3, method='nearest')
st = group_by.get_loc(day_1, method='nearest')
out = group_by.iloc[nd] - group_by.iloc[st]
But it looks like it is not a valid method...
AttributeError: 'DataFrameGroupBy' object has no attribute 'get_loc'
Anyone?
Upvotes: 0
Views: 213
Reputation: 129
ANSWER:
I found a way to answer my own question. Assuming I am looking for the location of one given day only (then extrapolate for my specific question):
group_by = df.groupby(level='lvl_1')
ans = group_by.nth(df.index.get_level_values('lvl_2').unique().get_loc(day_2, method='nearest'))
Ideally, I would work with the location of each groupid, considering that the datetime vector could be different. However, I am having a hard time to figure out the last step...:
group_by = df.groupby(level='lvl_1')
loc = group_by.apply(lambda x: x.index.get_level_values('lvl_2').unique().get_loc(day_2, method='nearest'))
ans = group_by.nth(loc.groupby(level='lvl_1'))
But it gives me an error for my last line:
TypeError: n needs to be an int or a list/set/tuple of ints
If someone finds a way to solve this slight issue, fire up! In the meantime, my temporary answer does the job. thxs
Upvotes: 0
Reputation: 1879
This is a bit different from yours in spirit, but it should give what you want (although if your database is very big it might waste memory):
expanded = df.reset_index().pivot_table(index='lvl_1',columns='lvl_2',values='value')
expanded[day_3] - expanded[day_1]
This returns a Series with the difference:
lvl_1
di -0.225526
fi 0.182643
dtype: float64
Upvotes: 1