Animesh Shah
Animesh Shah

Reputation: 146

Deleting columns with specific conditions

I have a dataframe output from the python script which gives following output

Datetime High Low Time
546 2021-06-15 14:30:00 15891.049805 15868.049805 14:30:00
547 2021-06-15 14:45:00 15883.000000 15869.900391 14:45:00
548 2021-06-15 15:00:00 15881.500000 15866.500000 15:00:00
549 2021-06-15 15:15:00 15877.750000 15854.549805 15:15:00
550 2021-06-15 15:30:00 15869.250000 15869.250000 15:30:00

i Want to remove all rows where time is equal to 15:30:00. tried different things but unable to do. Help please.

import pandas as pd
import datetime as dt
from pandas.core.frame import DataFrame
import yfinance as yf
import numpy as np
import xlwings as xw

ticker = "^NSEI"
# getting Intraday data for plotting 15 minutes and 75 minutes chart
en = dt.date.today() + dt.timedelta(days=1)
st = en - dt.timedelta(days=30)
df = yf.download(ticker,start = st, end=en, interval = '15m')\
.drop(['Open', 'Close', 'Adj Close', 'Volume'], axis = 1).reset_index()
df['Datetime'] = pd.to_datetime(df['Datetime'].astype(str).str[:-6])
df['Time'] = df['Datetime'].dt.time


print(df)

Upvotes: 0

Views: 141

Answers (3)

Karn Kumar
Karn Kumar

Reputation: 8826

As Pandas also provides vectorized string operations across the dataframe, thus it's easy to get the rows those contain the string:

Data-Frame

>>> df
          Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00
4  6/15/2021 15:30  15869.25000  15869.25000  15:30:00

Result:

Method One:

Using str.contains ...

>>> df[~df['Time'].str.contains('15:30:00')]
          Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

OR

if you are looking based on the Datetime

>>> df[~df['Datetime'].str.contains('15:30')]
          Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

OR

>>> df[~df.Time.str.contains("15:30") == True]
          Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

OR

>>> df[df['Time'].str.contains('15:30') == False]
          Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

OR

>>> df[df['Time'].str.contains('15:30') == 0]
          Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

Method two:

Using isin ...

>>> df[~df['Time'].isin(['15:30:00'])]
          Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

Method three:

Using Not equal to of dataframe and other, element-wise (binary operator ne).

>>> df[df.Time != '15:30:00']
          Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

OR

>>> df[df['Time'] != '15:30:00']
          Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

OR

>>> df[df['Time'].ne('15:30:00')]
          Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

Upvotes: 3

user13300964
user13300964

Reputation:

You can try this:

import pandas as pd

test_data=pd.read_csv("test.csv")
test_data=test_data[test_data["Time"]!="15:30:00"]
print(test_data)

Just select rows based on condition.

Upvotes: 0

inarticulatus
inarticulatus

Reputation: 175

The way I did was the following,

First we get the the time we want to remove from the dataset, that is 15:30:00 in this case.

Since the Datetime column is in the datetime format, we cannot compare the time as strings. So we convert the given time in the datetime.time() format.

rm_time = dt.time(15,30)

With this, we can go about using the DataFrame.drop()

df.drop(df[df.Datetime.dt.time == rm_time].index)

Upvotes: 2

Related Questions