Reputation:
Say I have thousands of random (x, y)
data points and I store x
s and y
s in two columns in a dataframe. It is important to note that all x
s are integers, but y
s are continuous numbers. If I plot them in a scatter plot using Matplotlib, it looks like below. Now I want to get the minimum boundary of the plot, which I depicted in a red curve. How should I do it? To make it clear, I want to get the indices for (x,y)
pairs with the minimum y
value for each x
, so the length of the indices should be equal to len(set(x))
.
Upvotes: 2
Views: 2968
Reputation: 5389
Is df.grouby('x').min()
what you want?
A full example:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'x': np.random.randint(10, size=1000), 'y': np.random.rand(1000)})
df.plot.scatter('x', 'y', color='k')
df.groupby('x').min().plot(ax=plt.gca(), color='red')
To get the indices of the original dataframe you can use idxmin
on the groupby
e.g.
df.groupby('x').idxmin()
Upvotes: 1
Reputation: 152
try:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'A':[1,2,3,1,2,3],'B':[1.3,2.6,3.2,5.6,4.5,3.1]})
fig, ax = plt.subplots()
ax.plot(df.A, df.B, '-')
temp = df.groupby('A')['B'].min().reset_index()
ax.plot(temp.A, temp.B, 'r-')
Upvotes: 0