Dana7371
Dana7371

Reputation: 57

How to extract Data by date in python Pandas

I am trying to extract data by certain dates (for example 06/20/2021 - 06/30/2021). Right now it reads the CSV file, sorts the data by date, and finds any duplicates. The next step is to extract all data by a date timeframe and am wondering how I can do this. Any help is very appreciated :). This is what I have below:

import pandas as pd
from datetime import date, timedelta

#df = pd.read_excel(r"/Users/britevoxops2/Desktop/sample_date.xlsx") #reading Excel File

df = pd.read_csv(r"/Users/filename/Desktop/sample_date.csv")
print(df) #print original data
df.head()

Final_result = df.sort_values('Joining Date') #sorting date
print(Final_result)

duplicate = df[df['Name'].duplicated() == True] #finding duplicate name
print('Here are the Duplicates: \n',duplicate) 


 

Upvotes: 1

Views: 6793

Answers (3)

Karn Kumar
Karn Kumar

Reputation: 8816

Below should work for you.

Sample DataFrame:

>>> df
          Date
0   06/10/2021
1   06/11/2021
2   06/12/2021
3   06/13/2021
4   06/14/2021
5   06/15/2021
6   06/16/2021
7   06/17/2021
8   06/18/2021
9   06/19/2021
10  06/20/2021
11  06/21/2021
12  06/22/2021
13  06/23/2021
14  06/24/2021
15  06/25/2021
16  06/26/2021
17  06/27/2021
18  06/28/2021
19  06/29/2021
20  06/30/2021

Convert the Date column to datetime formate:

>>> df['Date'] = pd.to_datetime(df['Date'])
>>> df
         Date
0  2021-06-10
1  2021-06-11
2  2021-06-12
3  2021-06-13
4  2021-06-14
5  2021-06-15
6  2021-06-16
7  2021-06-17
8  2021-06-18
9  2021-06-19
10 2021-06-20
11 2021-06-21
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30

Now select the range between dates you want:

>>> df[(df['Date'] > '06/21/2021') & (df['Date'] <= '06/30/2021')]
         Date
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30

another way around to use a boolean mask, then use df.loc[mask]

>>> mask = (df['Date'] > '06/21/2021') & (df['Date'] <= '06/30/2021')

>>> print(df.loc[mask])
         Date
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30

Third Method:

Using pandas.Series.between

>>> df[df.Date.between("06/21/2021", "06/30/2021")]
         Date
11 2021-06-21
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30
# df[df['Date'].between("06/21/2021", "06/30/2021")]
# df.loc[df['Date'].between('06/21/2021','06/30/2021', inclusive=True)] <-- You can use `inclusive` with True or False.

Using df.query, You can refer to variables in the environment by prefixing them with an ‘@’ character, like used below.

>>> start_date, end_date = "06/21/2021", "06/30/2021"

>>> print(df.query('Date >= @start_date and Date <= @end_date'))
         Date
11 2021-06-21
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30

Upvotes: 0

user42
user42

Reputation: 959

You can convert it to the pandas datetime format and extract your required dates using

df_date = df[(df['Joining Date'] < '23-03-21') & (df['Joining Date'] > '03-03-21')]

Upvotes: 2

user16401871
user16401871

Reputation:

df.loc['2021-06-20' : '2021-06-30']

Upvotes: 0

Related Questions