Reputation: 3661
Let's say I have 3 pandas DatetimeIndex
objects:
import pandas as pd
idx1 = pd.date_range('2019-01-01 00:00:00', '2019-01-01 01:00:00', freq='5T')
idx2 = pd.date_range('2019-01-01 02:00:00', '2019-01-01 03:00:00', freq='5T')
idx3 = pd.date_range('2019-01-02 00:00:00', '2019-01-02 01:00:00', freq='5T')
How can I combine them into a single index?
The roundabout way I do is by first converting them to series, then using pandas.concat
function to combine them, and then converting the series back to DatetimeIndex
:
# This works, but requires type conversions to and from
# intermediate series objects:
combined = pd.concat([idx1.to_series(), idx2.to_series(), idx3.to_series()])
idx = pd.DatetimeIndex(combined.values)
Is there a direct way to combine indices in pandas
?
Upvotes: 19
Views: 16183
Reputation: 574
Following other answers, I did some speed benchmarking with random DateTimeIndexes.
Looks like despite the creation of numerous objects, casting to Series
, concat
and casting back is still the fastest option by far (x100 if unsorted) compared to unions
. For sorted values, the solution of @Hunaphu is quicker but less readable. So, my solution would be:
res = pd.DatetimeIndex(pd.concat([
pd.Series(dti)
for dti in date_time_indexes
])).sort_values()
Benchmark code:
start_range = pd.Timestamp('2022-01-01')
end_range = pd.Timestamp('2022-12-31')
date_time_indexes = []
for _ in range(100):
start_date = pd.Timestamp(random.uniform(start_range.value, end_range.value))
end_date = pd.Timestamp(random.uniform(start_date.value, end_range.value))
freq = random.choice(['D', 'H', 'T'])
dti = pd.date_range(start=start_date, end=end_date, freq=freq)
date_time_indexes.append(dti)
def cascaded_unions():
res = date_time_indexes[0]
for date_time_index in date_time_indexes[1:]:
res = res.union(date_time_index)
return res
res1 = pd.DatetimeIndex(pd.concat([pd.Series(dti) for dti in date_time_indexes]))
# timeit 60.6 ms
res2 = pd.DatetimeIndex(np.unique(np.hstack(date_time_indexes)))
# timeit 194 ms
res3 = pd.DatetimeIndex(pd.concat([pd.Series(dti) for dti in date_time_indexes])).sort_values()
# timeit 417 ms
res4 = cascaded_unions()
# timeit 7.79 s
res5 = date_time_indexes[0].union_many(date_time_indexes[1:])
# timeit 7.86 s and is deprecated
Please note that method #1 returns an unsorted DateTimeIndex, hence method #3.
Upvotes: 3
Reputation: 701
In my case this is what I needed:
ixfull = pd.DatetimeIndex(np.unique(np.hstack(ix_list)))
Since union(a, b) = a + b - intersection(a, b) = unique(a + b)
. Verbose version below:
fastunion = lambda x: np.unique(np.hstack(x))
def ixlist2ix(ix_list):
ixunion = fastunion(ix_list)
ixfull = pd.DatetimeIndex(ixunion)
return ixfull
Upvotes: 3
Reputation: 480
combined = idx1.union_many([idx2, idx3, ...])
Although Panda's documentation on this function says:
A bit of a hack to accelerate unioning a collection of indexes
Upvotes: 4
Reputation: 12684
Try the clause 'union'.
combined = idx1.union(idx2).union(idx3)
Upvotes: 26