hazrmard
hazrmard

Reputation: 3661

How to concatenate DatetimeIndex objects in pandas?

Let's say I have 3 pandas DatetimeIndex objects:

import pandas as pd

idx1 = pd.date_range('2019-01-01 00:00:00', '2019-01-01 01:00:00', freq='5T')
idx2 = pd.date_range('2019-01-01 02:00:00', '2019-01-01 03:00:00', freq='5T')
idx3 = pd.date_range('2019-01-02 00:00:00', '2019-01-02 01:00:00', freq='5T')

How can I combine them into a single index?

The roundabout way I do is by first converting them to series, then using pandas.concat function to combine them, and then converting the series back to DatetimeIndex:

# This works, but requires type conversions to and from
# intermediate series objects:
combined = pd.concat([idx1.to_series(), idx2.to_series(), idx3.to_series()])
idx = pd.DatetimeIndex(combined.values)

Is there a direct way to combine indices in pandas?

Upvotes: 19

Views: 16183

Answers (4)

Théo Rubenach
Théo Rubenach

Reputation: 574

Following other answers, I did some speed benchmarking with random DateTimeIndexes.

Looks like despite the creation of numerous objects, casting to Series, concat and casting back is still the fastest option by far (x100 if unsorted) compared to unions. For sorted values, the solution of @Hunaphu is quicker but less readable. So, my solution would be:

res = pd.DatetimeIndex(pd.concat([
    pd.Series(dti) 
    for dti in date_time_indexes
])).sort_values()

Benchmark code:

start_range = pd.Timestamp('2022-01-01')
end_range = pd.Timestamp('2022-12-31')

date_time_indexes = []
for _ in range(100):
    start_date = pd.Timestamp(random.uniform(start_range.value, end_range.value))
    end_date = pd.Timestamp(random.uniform(start_date.value, end_range.value))
    freq = random.choice(['D', 'H', 'T'])
    dti = pd.date_range(start=start_date, end=end_date, freq=freq)
    date_time_indexes.append(dti)

def cascaded_unions():
    res = date_time_indexes[0]
    for date_time_index in date_time_indexes[1:]:
        res = res.union(date_time_index)
    return res

res1 = pd.DatetimeIndex(pd.concat([pd.Series(dti) for dti in date_time_indexes])) 
# timeit 60.6 ms
res2 = pd.DatetimeIndex(np.unique(np.hstack(date_time_indexes)))
# timeit 194 ms
res3 = pd.DatetimeIndex(pd.concat([pd.Series(dti) for dti in date_time_indexes])).sort_values()
# timeit 417 ms
res4 = cascaded_unions()
# timeit 7.79 s
res5 = date_time_indexes[0].union_many(date_time_indexes[1:])
# timeit 7.86 s and is deprecated

Please note that method #1 returns an unsorted DateTimeIndex, hence method #3.

Upvotes: 3

Hunaphu
Hunaphu

Reputation: 701

In my case this is what I needed:

ixfull = pd.DatetimeIndex(np.unique(np.hstack(ix_list)))

Since union(a, b) = a + b - intersection(a, b) = unique(a + b). Verbose version below:

fastunion = lambda x: np.unique(np.hstack(x))
def ixlist2ix(ix_list):
    ixunion = fastunion(ix_list)
    ixfull = pd.DatetimeIndex(ixunion)
    return ixfull

Upvotes: 3

Pontus Hultkrantz
Pontus Hultkrantz

Reputation: 480

combined = idx1.union_many([idx2, idx3, ...])

Although Panda's documentation on this function says:

A bit of a hack to accelerate unioning a collection of indexes

Upvotes: 4

jose_bacoy
jose_bacoy

Reputation: 12684

Try the clause 'union'.

combined = idx1.union(idx2).union(idx3)

Upvotes: 26

Related Questions