Reputation: 11
There is 5 DataFrames, I will call them b1, b2, b3, b4, b5
.
They have data structure which consists of the columns ['Date', 'Value']
I got the data from 2014 to 2015.
Every DataFrame has different date counting system. So, I want to get rid of the rows in the DataFrames which have non-matching dates.
How can I do that?
So, I will delete all the data if Date
of the data is not included in all DataFrames: b1, b2, b3, b4, b5
This works for the first time
for i in range(len(b2.index)):
k = 0
for j in range(len(b1.index)):
if b2['Date'][i] == b1['Date'][j]:
k = k+1
else:
k = k
if k == 1:
pass
if k == 0:
b2 = b2.drop([i])
But, after that I excute this code one more time after I did that, there would be some error like this:
KeyError Traceback (most recent call last) in () 2k = 0 3for j in range(len(b2.index)): ----> 4 if b1['Date'][i] == b2['Date'][j]: 5 k = k+1 6 else:
C:\Users\cms\Anaconda\lib\site-packages\pandas\core\series.pyc in getitem(self, key) 519def getitem(self, key): 520 try: --> 521 result = self.index.get_value(self, key) 522 523 if not np.isscalar(result):
C:\Users\cms\Anaconda\lib\site-packages\pandas\core\index.pyc in get_value(self, series, key) 1593 1594 try: -> 1595 return self._engine.get_value(s, k) 1596 except KeyError as e1: 1597 if len(self) > 0 and self.inferred_type in ['integer','boolean']:
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3113)()
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:2844)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3704)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:7224)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:7162)()
KeyError: 28L
What I want to do is
mlist = (b1,b2,b3,b4,b5)
for q in mlist:
for r in mlist:
for i in range(len(q.index)):
k = 0
for j in range(len(r.index)):
if q['Date'][i] == r['Date'][j]:
k = k+1
else:
k = k
if k == 1:
pass
if k == 0:
q = q.drop([i])`enter code here`
Upvotes: 1
Views: 236
Reputation: 5382
From this posting you can see an answer
merged_mlist = reduce(lambda left,right: pandas.merge(left,right,on='Date', how='inner'), mlist)
I believe the reason your code is failing is because you are not resetting the index after dropping. I think if you want to do this manually, you can do something like store the indices first and then drop all at once
inds_to_drop = []
for i in range(len(b2)):
for j in range(len(b1)):
if b2['Date'][i] != b1['Date'][j]:
inds_to_drop.append(i)
b2.drop( b2.index[inds_to_drop], inplace=True)
b2.reset_index( drop=True,inplace=True) # this may or may not be necessary, havent thought all the way through
Better yet, if you still want to implement your for loop
inds_to_drop = [ i for i,dt in enumerate(b2.Date) if not pandas.np.where( b1.Date==dt)[0].size ]
You can do that for each iteration.. Though it is just easier to run the reduce
function paired with pandas.merge
Upvotes: 0
Reputation: 32244
You want to keep the dates that are present in all datasets.
This can easily be done by performing an inner join on all of them using the pandas.merge()
function.
b = b1.merge(on='Date', right=b2, how='inner', suffixes=['', '_b2'])
b = b.merge(on='Date', right=b3, how='inner', suffixes=['', '_b3'])
b = b.merge(on='Date', right=b4, how='inner', suffixes=['', '_b4'])
b = b.merge(on='Date', right=b5, how='inner', suffixes=['_b1', '_b5'])
I am making some assumptions about your data here since you did not post any example data yet.
If I got anything wrong about your data, please let me know and I will correct my example.
Example:
b1 = pd.DataFrame({'Date':pd.date_range('2015-05-05', '2015-05-10'), 'Value':range(1,7)})
Date Value
0 2015-05-05 1
1 2015-05-06 2
2 2015-05-07 3
3 2015-05-08 4
4 2015-05-09 5
5 2015-05-10 6
b2 = pd.DataFrame({'Date':pd.date_range('2015-05-07', '2015-05-12'), 'Value':range(4,10)})
Date Value
0 2015-05-05 1
1 2015-05-06 2
2 2015-05-07 3
3 2015-05-08 4
4 2015-05-09 5
5 2015-05-10 6
b = b1.merge(on='Date', right=b2, how='inner', suffixes=['_b1', '_b2'])
Date Value_b1 Value_b2
0 2015-05-07 3 4
1 2015-05-08 4 5
2 2015-05-09 5 6
3 2015-05-10 6 7
Upvotes: 0
Reputation: 1485
How about this? Use set.intersecton()
:
s = set(b1.index)
for b in [b2,b3,b4,b5]:
s=s.intersection(set(b.index))
for b in [b1,b2,b3,b4,b5]:
b=b.drop(b.index[-b.index.isin(s)],inplace=True)
Upvotes: 0