Reputation: 1
i am trying to iterate over below list which is retrieved from HTML table
table = pd.read_html(url)
i have extracted the 3rd table :
table1 = table[2]
i am trying to count upto certain word in list variable table1. the output structure of the list is shown below
0 1 2 <----------headers
1 A AA BBB
2 B BB CCC
3 C CC CCC
4 D DD DDD
5 catchme catchme catchme
6 E FF FFF
7 G GG GGG
when i try to print the type of the variable table1 which the above output is stored, it is shown as below:
<class 'list'>
How to count until python can find the word that contains "catch" in the above list? So that expected output would be : 4
0 0 1 <----------headers
1 A AA BBB <---------- 1
2 B BB CCC <---------- 2
3 C CC CCC <---------- 3
4 D DD DDD <---------- 4
5 catchme catchme catchme <--- found catch- exit loop
6 E FF FFF
7 G GG GGG
so far my code as shown below
for phrase in table1: #variable stored list
print(phrase)
if 'catchme' in phrase:
finalinput = ['catchme'] + [i]
else :
i = i+1
but the above code is only looping through A AA BBB and it is exiting the loop.
thanks a lot in advance.
Upvotes: 1
Views: 558
Reputation: 588
A really simple solution is to just iterate through the 2D list and break when you encounter the condition
catchphrase = 'DD'
catchphrase_row = None
for index, row in table1.iterrows():
if any(catchphrase in r for r in row.to_list()):
catchphrase_row = index
break
# Use pandas just to show the data
print(pd.DataFrame(table1))
# Show where the catchphrase was found
print(f'{catchphrase} found in row {catchphrase_row}')
Outputs
0 1 2
0 A AA AAA
1 B BB BBB
2 C CC CCC
3 D DD DDD
4 E EE EEE
5 F FF FFF
6 G GG GGG
DD found in row 3
Upvotes: 1
Reputation: 395
I think first element of your list might be having pandas dataframe object.
Try following,
table[0][table[0].isin(['catchme']).any(axis=1)].index[0]
For example, I tried to read from following url and got list type object in output but when I tried to traverse it i found the dataframe object at 1st index.
Example code:
import pandas as pd
df = pd.read_html("https://www.railyatri.in/trains-between-stations")
df[0][df[0].isin(['Madgaon (MAO)']).any(axis=1)].index[0]
# output 4
It happens because, pandas.read_html()
function returns list of dataframes instead of dataframe object.
Upvotes: 2