Reputation: 65
so I have a for loop that loops through countries and each country has either a yes or a no, I want the corresponding animal to be added to a list each time there is a yes triggered. For example, I have a list that goes
Countries = ['Germany','France'..etc etc]
my DF is something like this
animal Germany France
Rabbit yes yes
Bear no yes
...
I want a list of animals such that there is a yes for the countries selected in the countries list. So in the instance above, I would want
animal_list = [Rabbit, Rabbit, Bear]
and my main code goes something like this, I have my attempt below as well but it doesn't work. Is there a clean way of doing it?
Countries = ['Germany','France'..etc etc]
animals_list = []
for country in Countries:
animal_list = animal_list.append(df[df[country] == 'yes'],'animal'])
The for loop is required so I am unable to do it off the bat using pandas.
Upvotes: 0
Views: 758
Reputation: 65
I found a very simple solution which seems to do the trick for me.
Countries = ['Germany','France'..etc etc]
animals_list = []
for country in Countries:
animals = list(df[df[country] == 'yes'],'animal'])
animals_list = animals_list + animals
Upvotes: 0
Reputation: 432
Considering you have a Dataframe like this
data = {'animal':['Rabbit', 'Bear'],
'Germany':['yes', 'no'],
'France': ['yes', 'no']
}
df = pd.DataFrame(data)
If the wanted countries are given in a list:
# In Python, Try to use lowercase, underscore seperated names for your variables (PEP8)
countries = ['Germany', 'France']
Then you can select those columns:
# Select the countries that you want
df_subset = df[df.columns.intersection(countries)]
And calculate number of yes for each animal:
animals_index_to_num_yes = df_subset.eq('yes').sum(axis=1)
In this way the list can be created very easily:
animals_list = []
for index, animal in df['animal'].iteritems():
occurences = animals_index_to_num_yes.get(index)
animals_list.extend(
[animal] * occurrences
)
Notes:
for
loops in Pandas as much as possible, in general, built-in methods will have a better performance because of the use of concurrency. See this excellent answer for more.for
loop.Upvotes: 2
Reputation: 639
animals_list=[]
country_list=['germany','france']
for i in range(len(df)):
for country in country_list:
if df[country].iloc[i]=='yes':
animals_list.append(df.animal.iloc[i])
print(animal_list)
Output : ['rabbit', 'rabbit', 'bear']
Upvotes: 0