Martin Bouhier
Martin Bouhier

Reputation: 361

Data Analyzed by Python CSV

I convert a csv into a list called a. I have a way to classify my data through a conditional. The problem is that it is not working. If there is any element called 'Stable' on all my Cliente, I put the conditional of 'Estable' which is not what I need but for all Clients that do not have 'Estable' as AAA and BBB I want you to put 'NoAnalyzed' as I explain below the code.

import pandas as pd

a = [['Cliente', 'Fecha', 'Variables', 'Dia Previo', 'Mayor/Menor', 'Dia a Analizar', 'Analisis'], 
['AAA', '27/12/2017', 'ECPM_medio', '0.41', 'Dentro del Margen', '0.35', 'Incremento'],
['BBB', '27/12/2017', 'ECPM_medio', '1.06', 'Dentro del Margen', '1.06', 'Alerta'],
['CCC', '27/12/2017', 'ECPM_medio', '1.06', 'Dentro del Margen', '1.06', 'Estable']]



headers = a.pop(0)
df = pd.DataFrame(a, columns = headers)
df['Analisis']


for elemento in df['Analisis']:
    if elemento == 'Estable':
        df['Status'] = 'Stable: The client''s performance was Stable'
    else:
        df['Status'] = 'NoAnalyzed'


df1= df.groupby(['Cliente','Fecha', 'Status']).size()
df1

output:
>>>
Cliente  Fecha       Status                                    
AAA      27/12/2017  Stable: The clients performance was Stable    1
BBB      27/12/2017  Stable: The clients performance was Stable    1
CCC      27/12/2017  Stable: The clients performance was Stable    1

I need:
>>>
Cliente  Fecha       Status                                    
AAA      27/12/2017  NoAnalyzed    1
BBB      27/12/2017  NoAnalyzed    1
CCC      27/12/2017  Stable: The clients performance was Stable    1

Upvotes: 1

Views: 71

Answers (2)

jezrael
jezrael

Reputation: 863701

I believe you need numpy.where or map, because in pandas best avoid loops because slow:

mask =  df['Analisis'] == 'Estable'
df['Status'] = np.where(mask, 'Stable: The client''s performance was Stable', 'NoAnalyzed')

Or similar:

d = {True: 'Stable: The client''s performance was Stable',False: 'NoAnalyzed'}
df['Status'] = mask.map(d)

print (df)
  Cliente       Fecha   Variables Dia Previo        Mayor/Menor  \
0     AAA  27/12/2017  ECPM_medio       0.41  Dentro del Margen   
1     BBB  27/12/2017  ECPM_medio       1.06  Dentro del Margen   
2     CCC  27/12/2017  ECPM_medio       1.06  Dentro del Margen   

  Dia a Analizar    Analisis                                      Status  
0           0.35  Incremento                                  NoAnalyzed  
1           1.06      Alerta                                  NoAnalyzed  
2           1.06     Estable  Stable: The clients performance was Stable  

Upvotes: 3

Sociopath
Sociopath

Reputation: 13426

The problem is you are directly assigning the single value to column rather than list/array/series. A single value is replicating itself in each row. I would suggest you to make a list and assign it to your df['Status'] column.

status=[]
for elemento in df['Analisis']:
    if elemento == 'Estable'
        status.append('Stable: The client''s performance was Stable')
    else:
        status.append('NoAnalyzed')

df['Status'] = status

This should work.

Upvotes: 0

Related Questions