LukeyP
LukeyP

Reputation: 55

Grouping time values using Pandas Groupby

so I have an object containing filtered data from a Django database. The data contains a list of time values in the form of: YYYY-MM-DD HH:MM:SS and I'm trying to group each second to its corresponding minute and each minute to its corresponding hour. Then I need to pass this grouping to the front end of my website that uses Javascript.

So far I have code that does the grouping, though it's not exactly what I want, the code and output are seen below:

# Makes pd dataframe of database object, sets dateTime as index so we can easily extract hour, min, sec later
queryToDataFrame = pd.DataFrame(filterTable.values()[1:],columns=filterTable.values()[0]).set_index('dateTime')
        
hours = queryToDataFrame.index.hour # extracts hours from dataframe
minutes = queryToDataFrame.index.minute
seconds = queryToDataFrame.index.second
        
timepd = pd.DataFrame({'hours':hours, 'minutes':minutes, 'seconds':seconds}) # puts time values into new dataframe for easier processing
groupVar = timepd.groupby([timepd.hours, timepd.minutes]).apply(print) # groups minutes to hours and seconds to mins

Output

hours minutes seconds
0 20 52 10
1 20 52 30
2 20 52 35
hours minutes seconds
3 20 53 0
4 20 53 5
5 20 53 10

...

This is so close to the correct grouping that I was trying to get but I'm actually hoping for something that looks like:

hours minutes seconds
0 20 52 0
1 5
2 10
3 53 0
4 5
5 21 1 0

Where there is only one minute value that corresponds to all of the seconds that fall into that minutes category and same for hours.

I've seen similar examples of this but so far the examples have all differed just enough to make this task extremely confusing for me. Is this possible to accomplish just using Pandas? Or maybe there is a different way to achieve this? The data that this code will handle is expected to be very large so I'm trying to avoid using loops.

If you needed some data to test, this is what the "queryToDataFrame" variable converts to a data frame:

<QuerySet [{'id': 10063705, 'valueName': 'Temp', 'value': 3.3, 'units': 'C', 'dateTime': datetime.datetime(2021, 3, 18, 20, 51, 50, tzinfo=<UTC>), 'timestamp': 1616122310.0}, 
{'id': 10063745,'valueName': 'Temp', 'value': 3.4, 'units': 'C', 'dateTime': datetime.datetime(2021, 3, 18, 20, 52, 10, tzinfo=<UTC>), 'timestamp': 1616122330.0}]

Upvotes: 2

Views: 103

Answers (1)

Laurent
Laurent

Reputation: 13518

You could try this:

import pandas as pd

df = pd.DataFrame(
    {
        "hours": [20, 20, 20, 20, 20, 20],
        "minutes": [52, 52, 52, 53, 53, 53],
        "seconds": [10, 30, 35, 0, 5, 10],
    }
)

# Convert values as strings (needed to clear content)
df = df.astype(str)

# Iterate on a copy of the dataframe and modify rows as needed
previous_row = df.iloc[0]
for i, row in df.copy().iterrows():
    if i == 0:
        continue
    if row["minutes"] == previous_row["minutes"]:
        df.loc[i, "minutes"] = ""
        df.loc[i, "hours"] = ""
    previous_row = row

print(df)
#Outputs
  hours minutes seconds
0    20      52      10
1                    30
2                    35
3    20      53       0
4                     5
5                    10

Upvotes: 1

Related Questions