Reputation: 43
I have a dataframe containing text in a column called text
and the respective language in which the text is written stored in the column lang
. What I am trying to do is create a secondary dataframe containing only the text wrritten in english(so has the value en
in the lang
column). The dataframe also contains other values so i can't just copy it. This is what I tried :
english_only = df['lang'] == 'en'
df_2 = pd.DataFrame(df[english_only]['text'],columns = ['text','sentiment'])
When I run the code i get a dataframe of the same length as the original one but it only contains NaN values. How can I solve this ?
Upvotes: 1
Views: 497
Reputation: 863741
Here DataFrame
constructor is not necessary, filter by mask for boolean indexing
and by columns names in list by DataFrame.loc
, (solution working if df
contains sentiment
column):
df_2 = df.loc[english_only, ['text','sentiment']]
If want add sentiment
column later:
df_2 = df.loc[english_only, ['text']]
Upvotes: 1