Reputation: 540
I have a large dataframe (around 35k entries), the index of this data frame is composed by dates (like 2014-02-12
), this dates are not unique. What I need to do is to find for each data what is the max value for each data and create a new data frame with it. I created a solution that works (it is down bellow) but it takes a lot of time to process. Does anyone knows a faster way that I could do this? Thank you.
#Creates a empty dataframe
dataset0514maxrec = pd.DataFrame(columns=dataset0514max.columns.values)
dataset0514maxrec.index.name = 'Date'
#Gets the unique values, find the groups, recover the max value and append it
for i in dataset0514max.index.unique():
tempDF1 = dataset0514max.loc[dataset0514max.index.isin([i])]
tempDF2 = tempDF1[tempDF1['Data_Value'] == tempDF1['Data_Value'].max()]
dataset0514maxrec = dataset0514maxrec.append(tempDF2.head(1))
print(dataset0514maxrec)
Upvotes: 0
Views: 313
Reputation: 402363
groupby
with levels
df.groupby(level=0).Data_Value.max().reset_index()
The next two options require the index to be a
datetime
index. If it isn't, convert it:df.index = pd.to_datetime(df.index)
resample
df.resample('D').max()
sort_values
+ duplicated
df = df.sort_values('Data_Value')
m = ~df.index.duplicated()
df = df[m]
Upvotes: 4