Cesar
Cesar

Reputation: 585

How to select certain rows from a dataframe in pandas

1- How can I select 5 different values inside a column ?

Example:

**Car**

Fiat
Fiat
Hyundai
Mitsubish
Kia
Kia
Hyundai
renault
porsche

How can I select the rows that has (Fiat, hyundai , Kia) for example ?

I tried this:

df.loc[df['cars'].isin("Fiat", "hyundai" , "Kia")]

But no success

Second problem:

In R we have pipe operator in order to reduce the number of new Copies*. What is the best solution in python ?

*Update - Is it a good way to create manipulation data of multiple steps ?

In this example:

df1 = df1.dropna()
df1 = df1[(df1['Meses'] != 'Total') & (df1['Orcado x Realizado'] == 'Realizado') & (df1['Area Negocio'] == 'Total das Áreas de Negócios')]
df1['Meses'] = df1['Meses'].replace({'M1': '01', 'M2': '02', 'M3': '03', 'M4': '04', 'M5': '05', 'M4': '04', 'M5': '05', 'M6': '06', 'M7': '07', 'M8': '08', 'M9': '09', 'M10': '10', 'M11': '11', 'M12': '12'})
df1['Date'] = pd.to_datetime(df1.Ano.astype(str) + '-' + df1.Meses.astype(str))
df1['Values'] = (df1['Values'] / 1000000)
df1 = df1[['Date', 'Contas Resultado', 'Values']]

Is there a way to make this code shorter, without having to create new variables ?

Upvotes: 1

Views: 233

Answers (1)

Ian Thompson
Ian Thompson

Reputation: 3305

For the first question:

With this code:

df.loc[df['cars'].isin("Fiat", "hyundai" , "Kia")]

You need to use a list:

df.loc[df['cars'].isin(["Fiat", "hyundai" , "Kia"])]

For the second question:

I'm not sure what you mean by creating new variables. It looks like you only have one (df1)

Edit

This code:

df1 = df1.dropna()
df1 = df1[(df1['Meses'] != 'Total') & (df1['Orcado x Realizado'] == 'Realizado') & (df1['Area Negocio'] == 'Total das Áreas de Negócios')]
df1['Meses'] = df1['Meses'].replace({'M1': '01', 'M2': '02', 'M3': '03', 'M4': '04', 'M5': '05', 'M4': '04', 'M5': '05', 'M6': '06', 'M7': '07', 'M8': '08', 'M9': '09', 'M10': '10', 'M11': '11', 'M12': '12'})
df1['Date'] = pd.to_datetime(df1.Ano.astype(str) + '-' + df1.Meses.astype(str))
df1['Values'] = (df1['Values'] / 1000000)
df1 = df1[['Date', 'Contas Resultado', 'Values']]

Can be reduced to:

df1.dropna(inplace = True) # shortened
df1 = df1[(df1['Meses'] != 'Total') & (df1['Orcado x Realizado'] == 'Realizado') & (df1['Area Negocio'] == 'Total das Áreas de Negócios')]
df1['Meses'].replace({'M1': '01', 'M2': '02', 'M3': '03', 'M4': '04', 'M5': '05', 'M4': '04', 'M5': '05', 'M6': '06', 'M7': '07', 'M8': '08', 'M9': '09', 'M10': '10', 'M11': '11', 'M12': '12'}, inplace = True) # shortened
df1['Date'] = pd.to_datetime(df1.Ano.astype(str) + '-' + df1.Meses.astype(str))
df1['Values'] = (df1['Values'] / 1000000)
df1 = df1[['Date', 'Contas Resultado', 'Values']]

Can't really do much more "shortening" besides using the inplace= parameters when available.

Upvotes: 1

Related Questions