Reputation: 45
I have this data set:
| Country |Languages Spoken |
| Afghanistan | Dari Persian, Pashtu (both official), other Turkic and minor languages
| Algeria | Arabic (official), French, Berber dialects
|Andorra | Catalán (official), French, Castilian, Portuguese
|Angola | Portuguese (official), Bantu and other African languages
|Antigua and Barbuda | English (official), local dialects
|Australia | English 79%, native and other languages
and I want to extract all the english speeaking countries, I think the easiest way would be to extract all the countries that have the word 'English' in the languages, ideally i want to have a new dataframe with the column english speaking and with values true or false.
Upvotes: 0
Views: 88
Reputation: 1979
One implementation of what you describe using pandas.Series.str.contains
:
>>> df
Country Languages Spoken
0 Afghanistan Dari Persian, Pashtu (both official), other Tu...
1 Algeria Arabic (official), French, Berber dialects
2 Andorra Catalán (official), French, Castilian, Portuguese
3 Angola Portuguese (official), Bantu and other African...
4 Antigua and Barbuda English (official), local dialects
5 Australia English 79%, native and other languages
>>>
>>> >>> df['English speaking'] = df['Languages Spoken'].str.contains('English')
>>> df
Country Languages Spoken English speaking
0 Afghanistan Dari Persian, Pashtu (both official), other Tu... False
1 Algeria Arabic (official), French, Berber dialects False
2 Andorra Catalán (official), French, Castilian, Portuguese False
3 Angola Portuguese (official), Bantu and other African... False
4 Antigua and Barbuda English (official), local dialects True
5 Australia English 79%, native and other languages True
Upvotes: 1