user2371563
user2371563

Reputation: 384

Unique data for each day using Python/Pandas Dataframe

I'm trying to process each day's data using pandas. Below is my code, data and current output. However, function getUniqueDates() has to traverse full df to get the unique dates in the list as shown below. Is there any simple and efficient way to get each day's data which can be passed to function processDataForEachDate() . Traversing big list is time consuming.I have stripped down columns in this example to keep it simple.

  data = {'date': ['2014-05-01 18:47:05.069722', '2014-05-01 18:47:05.119994', '2014-05-02 18:47:05.178768', '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.280592', '2014-05-03 18:47:05.332662', '2014-05-03 18:47:05.385109', '2014-05-04 18:47:05.436523', '2014-05-04 18:47:05.486877'], 
            'noOfJobs': [34, 25, 26, 15, 15, 14, 26, 25, 62, 41]}
    df = pd.DataFrame(data, columns = ['date', 'noOfJobs'])
    df = df.astype(dtype= {"date":'datetime64[ns]'})
    print(df)

    #Ouput====================================
                            date  noOfJobs
    0 2014-05-01 18:47:05.069722        34
    1 2014-05-01 18:47:05.119994        25
    2 2014-05-02 18:47:05.178768        26
    3 2014-05-02 18:47:05.230071        15
    4 2014-05-02 18:47:05.230071        15
    5 2014-05-02 18:47:05.280592        14
    6 2014-05-03 18:47:05.332662        26
    7 2014-05-03 18:47:05.385109        25
    8 2014-05-04 18:47:05.436523        62
    9 2014-05-04 18:47:05.486877        41


    def getUniqueDates():
        todaysDate = datetime.datetime.today().strftime('%Y-%m-%d')
        listOfDates=[]
        for c,r in df.iterrows():
            if r.date.date() != todaysDate:        
                todaysDate=r.date.date()
                listOfDates.append(todaysDate)
        return listOfDates

    listOfDates = getUniqueDates()
    print(listOfDates)

   # Output====================================
    [datetime.date(2014, 5, 1),
     datetime.date(2014, 5, 2),
     datetime.date(2014, 5, 3),
     datetime.date(2014, 5, 4)]



 for eachDate in listOfDates:
            processDataForEachDate(eachDate)

Upvotes: 0

Views: 432

Answers (1)

Brad Solomon
Brad Solomon

Reputation: 40918

You can access a NumPy array of unique dates with:

>>> df.date.dt.date.unique()
array([datetime.date(2014, 5, 1), datetime.date(2014, 5, 2),
       datetime.date(2014, 5, 3), datetime.date(2014, 5, 4)], dtype=object)

dt is an accessor method of the pandas Series df.date. Basically, it's a class that acts as a property-like interface to a bunch of date-time-related methods. The benefit is that it is vectorized (see here for a comparison to .iterrows() from a Pandas developer), and that accessor methods also use a "cached property" design:

Upvotes: 1

Related Questions