Reputation: 995
In python3 and pandas I have this dataframe:
candidatos_senado.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 88 entries, 17 to 650
Data columns (total 26 columns):
cpf 88 non-null object
nome 88 non-null object
nome_completo 88 non-null object
partido 88 non-null object
uf 88 non-null object
cargo_parlamentar 88 non-null object
tipo 88 non-null object
classe 88 non-null object
numero 88 non-null object
único 88 non-null object
assunto 88 non-null object
data_inicial 88 non-null object
data_final 88 non-null object
andamento 88 non-null object
link 88 non-null object
transparencia 88 non-null object
conferencia 88 non-null object
data_conferencia 88 non-null object
resumo 88 non-null object
observacao 86 non-null object
link_noticia_tribunal_confiavel 33 non-null object
interessa 87 non-null object
ministro_relator 88 non-null object
processo_conectado 8 non-null object
situacao 88 non-null object
cadastro_push 88 non-null object
dtypes: object(26)
memory usage: 18.6+ KB
Each line of this dataframe has information about legal proceedings, one court case per line
The column "nome" has names of people, such as:
FULANO DE TAL
BELTRANO DA SILVA
SICRANO APARECIDO
NINGUEM AUGUSTO
The "tipo" column has the types of lawsuits, only two types:
INQ
AP
I have counted how many APs and how many INQs there are in each name, and created a dataframe:
conta = candidatos_senado.groupby(['tipo','nome']).size().reset_index()
conta.columns = ['type_of_court_case', 'name', 'count']
conta.reset_index()
index type_of_court_case name count
0 0 AP ALFREDO NASCIMENTO 1
1 1 AP IZALCI LUCAS 1
2 2 AP JOSÉ REINALDO 1
3 3 AP RENAN CALHEIROS 1
4 4 AP SÉRGIO PETECÃO 2
5 5 AP ZECA DO PT 2
6 6 INQ ALFREDO NASCIMENTO 5
7 7 INQ CRISTOVAM BUARQUE 1
8 8 INQ EDISON LOBÃO 7
...
But my count can only be made with a condition found in a column.
The column "interessa" has been typed "sim" or "não".
I just want to count the number of AP or INQ when the line has "sim" in column "interessa", if there is no such condition I should ignore the line
Please, does anyone know how I can do this?
Upvotes: 1
Views: 238
Reputation: 863501
I think need first filter DataFrame
by boolean indexing
with isin
, if possible another values in interessa
column:
df = candidatos_senado[candidatos_senado["interessa"].isin(["sim", "não"])]
And then if need count also by interessa
column:
conta = df.groupby(['tipo','nome','interessa']).size().reset_index(name='count')
If want use original solution:
conta1 = df.groupby(['tipo','nome']).size().reset_index(name='count')
If want count only by tipo
column:
conta2 = df.groupby('tipo').size().reset_index(name='count')
Sample:
candidatos_senado = pd.DataFrame({'tipo':['INQ','INQ','INQ','AP','AP','AP'],
'interessa':['sim','ABC','sim','d','não','não'],
'val':[7,8,9,4,2,3],
'nome':list('CDCDCD')})
print (candidatos_senado)
tipo interessa val nome
0 INQ sim 7 C
1 INQ ABC 8 D
2 INQ sim 9 C
3 AP d 4 D
4 AP não 2 C
5 AP não 3 D
df = candidatos_senado[candidatos_senado["interessa"].isin(["sim", "não"])]
conta = df.groupby(['tipo','nome','interessa']).size().reset_index(name='count')
print (conta)
tipo nome interessa count
0 AP C não 1
1 AP D não 1
2 INQ C sim 2
conta1 = df.groupby(['tipo','nome']).size().reset_index(name='count')
print (conta1)
tipo nome count
0 AP C 1
1 AP D 1
2 INQ C 2
conta2 = df.groupby('tipo').size().reset_index(name='count')
print (conta2)
tipo count
0 AP 2
1 INQ 2
Upvotes: 1