Reputation: 6835
Let's say I have a dataframe with the following structure:
observation
d1 1
d2 1
d3 -1
d4 -1
d5 -1
d6 -1
d7 1
d8 1
d9 1
d10 1
d11 -1
d12 -1
d13 -1
d14 -1
d15 -1
d16 1
d17 1
d18 1
d19 1
d20 1
Where d1:d20 is some datetime index (generalized here).
If I wanted to split d1:d2, d3:d6, d7:d10, etc into their own respective "chunks", how would I do that pythonically?
Note:
df1 = df[(df.observation==1)]
df2 = df[(df.observation==-1)]
is not what I want.
I can think of brute force ways, which would work, but are not wildly elegant.
Upvotes: 2
Views: 1600
Reputation: 12515
Here's an example using real date.datetime
objects as indices.
import pandas as pd
import numpy as np
import datetime
import random
df = pd.DataFrame({'x': np.random.randn(40)}, index = [date.fromordinal(random.randint(start_date, end_date)) for i in range(40)])
def filter_on_datetime(df, year = None, month = None, day = None):
if all(d is not None for d in {year, month, day}):
idxs = [idx for idx in df.index if idx.year == year and idx.month == month and idx.day == day]
elif year is not None and month is not None and day is None:
idxs = [idx for idx in df.index if idx.year == year and idx.month == month]
elif year is not None and month is None and day is None:
idxs = [idx for idx in df.index if idx.year == year]
elif year is None and month is not None and day is not None:
idxs = [idx for idx in df.index if idx.month == month and idx.day == day]
elif year is None and month is None and day is not None:
idxs = [idx for idx in df.index if idx.day == day]
elif year is None and month is not None and day is None:
idxs = [idx for idx in df.index if idx.month == month]
elif year is not None and month is None and day is not None:
idxs = [idx for idx in df.index if idx.year == year and idx.day == day]
else:
idxs = df.index
return df.ix[idxs]
Running this:
>>> print(filter_on_datetime(df = df, year = 2016, month = 2))
x
2016-02-01 -0.141557
2016-02-03 0.162429
2016-02-05 0.703794
2016-02-07 -0.184492
2016-02-09 -0.921793
2016-02-12 1.593838
2016-02-17 2.784899
2016-02-19 0.034721
2016-02-26 -0.142299
Upvotes: 0
Reputation: 215137
You can create a group variable based on the cumsum()
of the diff()
of the observation
column where if the diff() is not equal to zero, assign a True value, thus every time a new value appears, a new group id will be created with the cumsum()
, and then you can either apply standard analysis after groupby()
with df.groupby((df.observation.diff() != 0).cumsum())...(other chained analysis here)
or split them into smaller data frames with list-comprehension
:
lst = [g for _, g in df.groupby((df.observation.diff() != 0).cumsum())]
lst[0]
# observation
#d1 1
#d2 1
lst[1]
# observation
#d3 -1
#d4 -1
#d5 -1
#d6 -1
...
Index chunks here:
[i.index for i in lst]
#[Index(['d1', 'd2'], dtype='object'),
# Index(['d3', 'd4', 'd5', 'd6'], dtype='object'),
# Index(['d7', 'd8', 'd9', 'd10'], dtype='object'),
# Index(['d11', 'd12', 'd13', 'd14', 'd15'], dtype='object'),
# Index(['d16', 'd17', 'd18', 'd19', 'd20'], dtype='object')]
Upvotes: 6