Andrew
Andrew

Reputation: 165

Group by list of different time ranges in Pandas

Edit: Changing example to use Timedelta indices.

I have a DataFrame of different time ranges that represent indices in my main DataFrame. eg:

ranges = pd.DataFrame(data=np.array([[1,10,20],[3,15,30]]).T, columns=["Start","Stop"])
ranges = ranges.apply(pd.to_timedelta, unit="s")
ranges
            Start            Stop
0 0 days 00:00:01 0 days 00:00:03
1 0 days 00:00:10 0 days 00:00:15
2 0 days 00:00:20 0 days 00:00:30

my_data= pd.DataFrame(data=list(range(0,40*5,5)), columns=["data"])
my_data.index = pd.to_timedelta(my_data.index, unit="s")

I want to calculate the averages of the data in my_data for each of the time ranges in ranges. How can I do this?

One option would be as follows:

ranges.apply(lambda row: my_data.loc[row["Start"]:row["Stop"]].iloc[:-1].mean(), axis=1)
    data
0    7.5
1   60.0
2  122.5

But can we do this without apply?

Upvotes: 1

Views: 243

Answers (1)

sammywemmy
sammywemmy

Reputation: 28644

Here is one way to approach it:

Generate the timedeltas and concatenate into a single block:

# note the use of closed='left' (`Stop` is not included in the build)
timedelta = [pd.timedelta_range(a,b, closed='left', freq='1s')
             for a, b in zip(ranges.Start, ranges.Stop)]

timedelta = timedelta[0].append(timedelta[1:])

Get the grouping which will be used for the groupby and aggregation:

counts = ranges.Stop.sub(ranges.Start).dt.seconds
counts = np.arange(counts.size).repeat(counts)

Group by and aggregate:

my_data.loc[timedelta].groupby(counts).mean()

    data
0    7.5
1   60.0
2  122.5 

Upvotes: 2

Related Questions