Reputation: 2189
I have a data frame like this:
df
col1 col2 col3 col4
A B C 12
A B C 8
A B C 10
P Q R 12
P Q R 11
K L S 1
K L S 15
U V R 20
I want to get those rows where col4 value is maximum for col3 values for each col1 and col2 combinations
for example the result I am looking for is
col1 col2 col3 col4
A B C 12
P Q R 12
K L S 15
U V R 20
how to do it in most efficient way using pandas ?
Upvotes: 0
Views: 712
Reputation: 5730
You need to use groupby:
import pandas as pd
# setup test data
data = {'col1': ['A', 'A', 'A', 'P', 'P', 'K', 'K', 'U'], 'col2': ['B', 'B', 'B', 'Q', 'Q', 'L', 'L', 'V'],
'col3': ['C', 'C', 'C', 'R', 'R', 'S', 'S', 'R'], 'col4': [12, 8, 10, 12,11,1,15,20]}
data = pd.DataFrame(data=data)
# get max values
out_data = data.groupby(['col1', 'col2', 'col3']).max()
Output:
col1 col2 col3 col4
A B C 12
K L S 15
P Q R 12
U V R 20
Upvotes: 0
Reputation: 896
Try this:
>>> import pandas as pd
>>> df = pd.read_csv("t.csv")
>>> df
col1 col2 col3 col4
0 A B C 12
1 A B C 8
2 A B C 10
3 P Q R 12
4 P Q R 11
5 K L S 1
6 K L S 15
7 U V R 20
>>> df.groupby(['col1']).max()
col2 col3 col4
col1
A B C 12
K L S 15
P Q R 12
U V R 20
Upvotes: 1
Reputation: 3331
You can us the groupby function with max() :
df = pd.DataFrame({'col1' : ['A','A','A','P','P'], 'col2' : ['B','B','B','Q','Q'],
'col3':['C','C','C','R','R'], 'col4':[12,8,10,12,11]})
df.groupby(['col1', 'col2']).max()
Out :
col1 col2 col3 col4
A B C 12
P Q R 12
Upvotes: 0