Reinaldo Chaves
Reinaldo Chaves

Reputation: 995

In pandas, how to count rows with groupby from the condition found in a column?

In python3 and pandas I have this dataframe:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 88 entries, 17 to 650
Data columns (total 26 columns):
cpf                                88 non-null object
nome                               88 non-null object
nome_completo                      88 non-null object
partido                            88 non-null object
uf                                 88 non-null object
cargo_parlamentar                  88 non-null object
tipo                               88 non-null object
classe                             88 non-null object
numero                             88 non-null object
único                              88 non-null object
assunto                            88 non-null object
data_inicial                       88 non-null object
data_final                         88 non-null object
andamento                          88 non-null object
link                               88 non-null object
transparencia                      88 non-null object
conferencia                        88 non-null object
data_conferencia                   88 non-null object
resumo                             88 non-null object
observacao                         86 non-null object
link_noticia_tribunal_confiavel    33 non-null object
interessa                          87 non-null object
ministro_relator                   88 non-null object
processo_conectado                 8 non-null object
situacao                           88 non-null object
cadastro_push                      88 non-null object
dtypes: object(26)
memory usage: 18.6+ KB

Each line of this dataframe has information about legal proceedings, one court case per line

The column "nome" has names of people, such as:


The "tipo" column has the types of lawsuits, only two types:


I have counted how many APs and how many INQs there are in each name, and created a dataframe:

conta = candidatos_senado.groupby(['tipo','nome']).size().reset_index()
conta.columns = ['type_of_court_case', 'name', 'count']

    index   type_of_court_case  name           count
0   0       AP              ALFREDO NASCIMENTO  1
1   1       AP              IZALCI LUCAS        1
2   2       AP              JOSÉ REINALDO       1
3   3       AP              RENAN CALHEIROS     1
4   4       AP              SÉRGIO PETECÃO      2
5   5       AP              ZECA DO PT          2
6   6       INQ             ALFREDO NASCIMENTO  5
7   7       INQ             CRISTOVAM BUARQUE   1
8   8       INQ             EDISON LOBÃO        7


But my count can only be made with a condition found in a column.

The column "interessa" has been typed "sim" or "não".

I just want to count the number of AP or INQ when the line has "sim" in column "interessa", if there is no such condition I should ignore the line

Please, does anyone know how I can do this?

Upvotes: 1

Views: 238

Answers (1)


Reputation: 863501

I think need first filter DataFrame by boolean indexing with isin, if possible another values in interessa column:

df = candidatos_senado[candidatos_senado["interessa"].isin(["sim", "não"])]

And then if need count also by interessa column:

conta = df.groupby(['tipo','nome','interessa']).size().reset_index(name='count')

If want use original solution:

conta1 = df.groupby(['tipo','nome']).size().reset_index(name='count')

If want count only by tipo column:

conta2 = df.groupby('tipo').size().reset_index(name='count')


candidatos_senado = pd.DataFrame({'tipo':['INQ','INQ','INQ','AP','AP','AP'],

print (candidatos_senado)
  tipo interessa  val nome
0  INQ       sim    7    C
1  INQ       ABC    8    D
2  INQ       sim    9    C
3   AP         d    4    D
4   AP       não    2    C
5   AP       não    3    D

df = candidatos_senado[candidatos_senado["interessa"].isin(["sim", "não"])]

conta = df.groupby(['tipo','nome','interessa']).size().reset_index(name='count')
print (conta)
  tipo nome interessa  count
0   AP    C       não      1
1   AP    D       não      1
2  INQ    C       sim      2

conta1 = df.groupby(['tipo','nome']).size().reset_index(name='count')
print (conta1)
  tipo nome  count
0   AP    C      1
1   AP    D      1
2  INQ    C      2

conta2 = df.groupby('tipo').size().reset_index(name='count')
print (conta2)
  tipo  count
0   AP      2
1  INQ      2

Upvotes: 1

Related Questions