creating a new dataframe using boolean masks

Question

I have a dataframe containing text in a column called text and the respective language in which the text is written stored in the column lang. What I am trying to do is create a secondary dataframe containing only the text wrritten in english(so has the value en in the lang column). The dataframe also contains other values so i can't just copy it. This is what I tried :

english_only = df['lang'] == 'en'
df_2 = pd.DataFrame(df[english_only]['text'],columns = ['text','sentiment'])

When I run the code i get a dataframe of the same length as the original one but it only contains NaN values. How can I solve this ?

jezrael · Accepted Answer

Here DataFrame constructor is not necessary, filter by mask for boolean indexing and by columns names in list by DataFrame.loc, (solution working if df contains sentiment column):

df_2 = df.loc[english_only, ['text','sentiment']]

If want add sentiment column later:

df_2 = df.loc[english_only, ['text']]

creating a new dataframe using boolean masks

Answers (1)

Related Questions