Marcel
Marcel

Reputation: 72

Drop rows of dataframe where dates are not the same

I have two dataframes which should be of same length but are not.

Its the stock prices of the last two days and the dataframes have the columns close price, date and volume.

I would now like to drop all rows of the dataframes where the dates are not equal so that I can plot and compare the data better.

Can somebody help me with my issue?

Thanks and kind regards

176   5338.8  2017-11-23T07:40:00.000000Z      81  
177   5336.7  2017-11-23T07:45:00.000000Z     176  
178   5342.8  2017-11-23T07:50:00.000000Z     109  
179   5339.8  2017-11-23T07:55:00.000000Z     114  
180   5342.8  2017-11-23T08:00:00.000000Z     285  
181   5337.4  2017-11-23T08:05:00.000000Z     363  
182   5333.9  2017-11-23T08:10:00.000000Z     174 

150 rows vs 180 rows. I would like to drop all rows where the dates are not equal to each other.

143   3549.3  2017-11-23T07:35:00.000000Z      14  
144   3547.3  2017-11-23T07:40:00.000000Z       4  
145   3548.3  2017-11-23T07:45:00.000000Z       8  
146   3551.3  2017-11-23T07:50:00.000000Z       5  
147   3548.3  2017-11-23T07:55:00.000000Z       9  
148   3550.0  2017-11-23T08:00:00.000000Z      42  
149   3548.0  2017-11-23T08:05:00.000000Z      89  
150   3548.0  2017-11-23T08:10:00.000000Z      13  

Upvotes: 1

Views: 143

Answers (1)

jezrael
jezrael

Reputation: 862641

Use numpy.union1d for same dates and then filter by boolean indexing with isin:

a = np.union1d(df1['date'], df1['date'])
print (a)
['2017-11-23T07:40:00.000000Z' '2017-11-23T07:45:00.000000Z'
 '2017-11-23T07:50:00.000000Z' '2017-11-23T07:55:00.000000Z'
 '2017-11-23T08:00:00.000000Z' '2017-11-23T08:05:00.000000Z'
 '2017-11-23T08:10:00.000000Z']

df1 = df1[df1['date'].isin(a)]
print (df1)
          a                         date    b
176  5338.8  2017-11-23T07:40:00.000000Z   81
177  5336.7  2017-11-23T07:45:00.000000Z  176
178  5342.8  2017-11-23T07:50:00.000000Z  109
179  5339.8  2017-11-23T07:55:00.000000Z  114
180  5342.8  2017-11-23T08:00:00.000000Z  285
181  5337.4  2017-11-23T08:05:00.000000Z  363
182  5333.9  2017-11-23T08:10:00.000000Z  174

df2 = df2[df2['date'].isin(a)]
print (df2)
          a                         date   b
144  3547.3  2017-11-23T07:40:00.000000Z   4
145  3548.3  2017-11-23T07:45:00.000000Z   8
146  3551.3  2017-11-23T07:50:00.000000Z   5
147  3548.3  2017-11-23T07:55:00.000000Z   9
148  3550.0  2017-11-23T08:00:00.000000Z  42
149  3548.0  2017-11-23T08:05:00.000000Z  89
150  3548.0  2017-11-23T08:10:00.000000Z  13

Upvotes: 1

Related Questions