Reputation: 384
I'm trying to process each day's data using pandas. Below is my code, data and current output. However, function getUniqueDates() has to traverse full df to get the unique dates in the list as shown below. Is there any simple and efficient way to get each day's data which can be passed to function processDataForEachDate() . Traversing big list is time consuming.I have stripped down columns in this example to keep it simple.
data = {'date': ['2014-05-01 18:47:05.069722', '2014-05-01 18:47:05.119994', '2014-05-02 18:47:05.178768', '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.280592', '2014-05-03 18:47:05.332662', '2014-05-03 18:47:05.385109', '2014-05-04 18:47:05.436523', '2014-05-04 18:47:05.486877'],
'noOfJobs': [34, 25, 26, 15, 15, 14, 26, 25, 62, 41]}
df = pd.DataFrame(data, columns = ['date', 'noOfJobs'])
df = df.astype(dtype= {"date":'datetime64[ns]'})
print(df)
#Ouput====================================
date noOfJobs
0 2014-05-01 18:47:05.069722 34
1 2014-05-01 18:47:05.119994 25
2 2014-05-02 18:47:05.178768 26
3 2014-05-02 18:47:05.230071 15
4 2014-05-02 18:47:05.230071 15
5 2014-05-02 18:47:05.280592 14
6 2014-05-03 18:47:05.332662 26
7 2014-05-03 18:47:05.385109 25
8 2014-05-04 18:47:05.436523 62
9 2014-05-04 18:47:05.486877 41
def getUniqueDates():
todaysDate = datetime.datetime.today().strftime('%Y-%m-%d')
listOfDates=[]
for c,r in df.iterrows():
if r.date.date() != todaysDate:
todaysDate=r.date.date()
listOfDates.append(todaysDate)
return listOfDates
listOfDates = getUniqueDates()
print(listOfDates)
# Output====================================
[datetime.date(2014, 5, 1),
datetime.date(2014, 5, 2),
datetime.date(2014, 5, 3),
datetime.date(2014, 5, 4)]
for eachDate in listOfDates:
processDataForEachDate(eachDate)
Upvotes: 0
Views: 432
Reputation: 40918
You can access a NumPy array of unique dates with:
>>> df.date.dt.date.unique()
array([datetime.date(2014, 5, 1), datetime.date(2014, 5, 2),
datetime.date(2014, 5, 3), datetime.date(2014, 5, 4)], dtype=object)
dt
is an accessor method of the pandas Series df.date
. Basically, it's a class that acts as a property-like interface to a bunch of date-time-related methods. The benefit is that it is vectorized (see here for a comparison to .iterrows()
from a Pandas developer), and that accessor methods also use a "cached property" design:
Upvotes: 1