Reputation: 1077
In pandas, I'd like to analyze groups if there is a single occurrence of a conditional value. I've included a sample dataframe with a first step attempt at identifying such groups below. So, let's say, in the data frame below, I want to filter the original data frame only for species of iris that ever had a sepal length greater than 6. In the last command, I'm counting the number of unique species groups that had a sepal length greater than 6 (so, at least I can count them).
But, what I really want is the original dataframe where I analyze rows only if the species had a sepal length greater than 6 (so, it would be a dataframe without the species "setosa" since they never have one).
The longer explanation is that I have a real dataset of users. Each user will have values in certain columns that may exceed a threshold value of interest. I haven't figured out how to analyze users who have these threshold values.
Perhaps a loop would be better. I might loop through each unique user name and look if any row with that user ever exceeds a certain value and gets some kind of new column (though I know loops are frowned upon in pandas so I'm posting here to see if there's some kind of well-known method of identifying groups by occurrence).
Thanks and let me know if I can make this question any more clear!
import pandas as pd
import seaborn as sns
import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
iris = sns.load_dataset('iris')
iris['longsepal'] = iris['sepal_length'] > 7
iris['longpetal'] = iris['petal_length'] > 5
iris.groupby(['longsepal'])['species'].nunique()
Upvotes: 0
Views: 65
Reputation: 107587
Consider groupby().transform()
to calculate inline max
aggregates to be later filtered on its value by species. Technically, the > 7
only returns one species as veriscolor
max reaches 7.0
. Below shows the operator and functional form of inequality logic.
iris['longsepal'] = iris.groupby(['species'])['sepal_length'].transform('max')
iris['longpetal'] = iris.groupby(['species'])['petal_length'].transform('max')
# DATA FILTERS
longsepal_iris = iris.loc[iris['longsepal'] > 7] # GREATER THAN OPERATOR FORM: >
longsepal_iris = iris.loc[iris['longsepal'].gt(7)] # GREATER THAN FUNCTIONAL FORM: gt()
longpetal_iris = iris.loc[iris['longpetal'] > 5] # GREATER THAN OPERATOR FORM: >
longpetal_iris = iris.loc[iris['longpetal'].gt(5)] # GREATER THAN FUNCTIONAL FORM: gt()
# SPECIES
longsepal_iris['species'].unique()
# ['virginica']
longpetal_iris['species'].unique()
# ['versicolor' 'virginica']
Upvotes: 3