Reputation: 899
I have two pandas DataFrames with a date-time stamp as an index called "datetime" and some floats as a column called "metric1" or "metric2". When I try to concatenate them I get the following error:
ValueError: cannot reindex from a duplicate axis
After reading for many hours I cannot find a solution that fixes my problem (e.g. What does `ValueError: cannot reindex from a duplicate axis` mean? "ValueError: cannot reindex from a duplicate axis" ).
I then spent a long time trying to recreate the problem but couldn't whithout my specific data but my data is two big to post on here.
In the end it seemed like certain parts of the dataframes were incompatible but other parts were fine. After comparing many different slices of indexes I finally found incompatible sets. Can someone please help me understand why I can't concatenate these.
Basically I would like it so that all possible time-stamps are in the index and there is a column for "metric1" and "metric2". If there is no data for a column at a given timestamp then we just have NaN or something. This works fine normally with pd.concat
but it is not working in this case. To recreate please use:
timestamp,metric1
2018-03-21 15:46:36,3.5555559999999997
2018-03-21 15:47:36,5.345001
2018-03-21 15:48:36,5.719998
timestamp,metric2
2018-03-28 05:49:59,3.28
2018-03-28 05:50:59,3.45
2018-03-28 05:51:59,3.258332
2018-03-28 05:52:59,3.068333
2018-03-28 05:53:59,2.9733330000000002
2018-03-28 05:54:59,3.0650009999999996
2018-03-28 05:55:59,3.109999
2018-03-28 05:56:59,3.3683330000000002
2018-03-28 05:57:59,3.1516669999999998
2018-03-28 05:58:59,3.051666
2018-03-28 05:59:59,3.3083339999999994
2018-03-28 06:01:01,3.328333
2018-03-28 06:01:01,3.1
2018-03-28 06:02:00,3.305
2018-03-28 06:03:00,3.29
2018-03-28 06:04:00,3.2183330000000003
2018-03-28 06:05:00,3.176666
2018-03-28 06:06:00,3.353333
2018-03-28 06:07:00,3.3233330000000003
2018-03-28 06:08:00,3.393332
2018-03-28 06:09:00,3.053334
2018-03-28 06:10:00,3.268333
2018-03-28 06:11:00,3.239999
2018-03-28 06:12:00,3.223332
2018-03-28 06:13:00,3.119999
timestamp,metric2
2018-03-21 00:00:00,10.665
2018-03-21 00:01:00,10.285
2018-03-21 00:02:00,10.12834
Now let's load the CSV files:
tt1 = pd.read_csv('test1.csv', index_col=0)
tt1.index = pd.to_datetime(tt1.index)
tt2 = pd.read_csv('test2.csv', index_col=0)
tt2.index = pd.to_datetime(tt2.index)
tt4 = pd.read_csv('test4.csv', index_col=0)
tt4.index = pd.to_datetime(tt4.index)
Now let's test concatenating them:
tt3 = pd.concat([tt1, tt4], axis = 1)
tt3 = pd.concat([tt1, tt2], axis = 1)
ValueError: cannot reindex from a duplicate axis
Upvotes: 1
Views: 626
Reputation: 13750
Don't merge
, join
. join
joins on the indices.
tt1.join(tt2, how='outer')
Upvotes: 1
Reputation: 24
I have solved your question.
Have a look this solution :)
import pandas as pd
tt1 = pd.read_csv('test1.csv', index_col=0)
tt1.index = pd.to_datetime(tt1.index)
tt2 = pd.read_csv('test2.csv', index_col=0)
tt2.index = pd.to_datetime(tt2.index)
tt4 = pd.read_csv('test4.csv', index_col=0)
tt4.index = pd.to_datetime(tt4.index)
tt3 = pd.concat([tt1, tt4], axis=1)
tt4 = tt3.reset_index().merge(tt2.reset_index(), how='outer')
tt4 = tt4.set_index('timestamp')
print(tt4)
Hope it will make sense
Upvotes: -1
Reputation: 8631
You have duplicate index in tt2
. That causes the error.
Correct way to get the output:
tt1.reset_index().merge(tt2.reset_index(), how='outer')
Upvotes: 2