Reputation: 57
I am trying to extract data by certain dates (for example 06/20/2021 - 06/30/2021). Right now it reads the CSV file, sorts the data by date, and finds any duplicates. The next step is to extract all data by a date timeframe and am wondering how I can do this. Any help is very appreciated :). This is what I have below:
import pandas as pd
from datetime import date, timedelta
#df = pd.read_excel(r"/Users/britevoxops2/Desktop/sample_date.xlsx") #reading Excel File
df = pd.read_csv(r"/Users/filename/Desktop/sample_date.csv")
print(df) #print original data
df.head()
Final_result = df.sort_values('Joining Date') #sorting date
print(Final_result)
duplicate = df[df['Name'].duplicated() == True] #finding duplicate name
print('Here are the Duplicates: \n',duplicate)
Upvotes: 1
Views: 6793
Reputation: 8816
Below should work for you.
>>> df
Date
0 06/10/2021
1 06/11/2021
2 06/12/2021
3 06/13/2021
4 06/14/2021
5 06/15/2021
6 06/16/2021
7 06/17/2021
8 06/18/2021
9 06/19/2021
10 06/20/2021
11 06/21/2021
12 06/22/2021
13 06/23/2021
14 06/24/2021
15 06/25/2021
16 06/26/2021
17 06/27/2021
18 06/28/2021
19 06/29/2021
20 06/30/2021
Convert the Date
column to datetime
formate:
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> df
Date
0 2021-06-10
1 2021-06-11
2 2021-06-12
3 2021-06-13
4 2021-06-14
5 2021-06-15
6 2021-06-16
7 2021-06-17
8 2021-06-18
9 2021-06-19
10 2021-06-20
11 2021-06-21
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30
>>> df[(df['Date'] > '06/21/2021') & (df['Date'] <= '06/30/2021')]
Date
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30
another way around to use a boolean mask, then use df.loc[mask]
>>> mask = (df['Date'] > '06/21/2021') & (df['Date'] <= '06/30/2021')
>>> print(df.loc[mask])
Date
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30
Using pandas.Series.between
>>> df[df.Date.between("06/21/2021", "06/30/2021")]
Date
11 2021-06-21
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30
# df[df['Date'].between("06/21/2021", "06/30/2021")]
# df.loc[df['Date'].between('06/21/2021','06/30/2021', inclusive=True)] <-- You can use `inclusive` with True or False.
Using df.query, You can refer to variables in the environment by prefixing them with an ‘@’ character, like used below.
>>> start_date, end_date = "06/21/2021", "06/30/2021"
>>> print(df.query('Date >= @start_date and Date <= @end_date'))
Date
11 2021-06-21
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30
Upvotes: 0
Reputation: 959
You can convert it to the pandas datetime format and extract your required dates using
df_date = df[(df['Joining Date'] < '23-03-21') & (df['Joining Date'] > '03-03-21')]
Upvotes: 2