Reputation: 27
I would like to analyse a list of orchids (input_df) if it contains orchid species that are on one of six lists. I import these lists from an xlsx file with six sheets as dictionary containing the six lists as DataFrames (orchid_checklists).
import pandas as pd
orchid_checklists = pd.read_excel('\\orchid_checklists.xlsx', sheet_name=None)
input_df = pd.read_excel('\\input.xlsx')
input_df['Orchideen-Checkliste'] = ''
With the following for loop with if condition I am trying to add the name of the Checklist into the field corresponding to the item in input_df['Input Name'] in the column ['Orchideen-Checkliste'] to visualize to what checklist one should refer.
for item in input_df['Input Name']:
for list_name, sheet in orchid_checklists.items():
genus = item.split(' ')[0]
if genus in sheet['referenced'].values:
input_df['Orchideen-Checkliste'] = list_name
else:
pass
In my test input list there is one species called "Bulbophyllum pachyrachis" that should be found. Unfortunately the name of the list "CL_Bulbophyllum" is put into all rows. I can´t figure out why.
In the next step I want to check if the species name is also in the column "exceptions" in either of my checklists. In that case that would not be the correct checklist. In these cases the full species name (e.g. "Aerangis ellisii", see CL_App_I and CL_III below) is found in the column "referenced" of another list. I haven´t started coding this exception, because I am still stuck on the part before, but any pointers how to approach this are highly welcomed.
This is the input data:
Input Name Orchideen-Checkliste
0 Sobralia madisonii
1 Stelis cocornaensis
2 Stelis gelida
3 Braemia vittata
4 Brassia escobariana
5 Aspasia silvana
6 Bulbophyllum maximum
7 Bulbophyllum pachyrachis
8 Chondroscaphe amabilis
9 Dresslerella hispida
10 Elleanthus sodiroi
11 Maxillaria mathewsii
orchid_checklists:
CL_III
referenced exceptions
0 Aerangis Aerangis ellisii
1 Angraecum NaN
2 Ascocentrum NaN
3 Bletilla NaN
4 Brassavola NaN
5 Calanthe NaN
6 Catasetum NaN
7 Miltonia NaN
8 Miltoniopsis NaN
9 Renanthera NaN
10 Renantherella NaN
11 Rhynchostylis NaN
12 Rossioglossum NaN
13 Vanda NaN
14 Vandopsis NaN
CL_App_I
referenced exceptions
0 Paphiopedilum NaN
1 Phragmipedium NaN
2 Aerangis ellisii NaN
3 Cattleya jongheana NaN
4 Cattleya lobata NaN
5 Dendrobium cruentum NaN
6 Mexipedium xerophyticum NaN
7 Peristeria elata NaN
8 Renanthera imshootiana NaN
CL_Bulbophyllum
referenced exceptions
0 Acrochaene NaN
1 Bulbophyllum NaN
2 Chaseella NaN
3 Codonosiphon NaN
4 Drymoda NaN
5 Monomeria NaN
6 Monosepalum NaN
7 Pedilochilus NaN
8 Succoglossum NaN
9 Sunipia NaN
10 Trias NaN
Thank you in advance for your help!
Upvotes: 0
Views: 49
Reputation: 23783
input_df['Orchideen-Checkliste'] = list_name
Assigns a value to every item of that column because you did not specify a row indexer.
Without changing your process too much: enumerate the items in input_df['Input Name']
when iterating and use the enumeration to specify the row for the assignment.
for index,item in enumerate(input_df['Input Name']):
for list_name, sheet in orchid_checklists.items():
genus = item.split(' ')[0]
if genus in sheet['referenced'].values:
input_df.loc[index,'Orchideen-Checkliste'] = list_name
Upvotes: 2