Parsing inner td tag with Beautifulsoop4 in python

Question

I am working on a small project, and I am having a hard time parsing the needed rows from an html code using bs4.

HTML:

 
 I need to extract -0.01% and 1.98% from these two lines
I used 
txt = parsed_html.find("table", {"id":"curr_table"}).find_all("td", {"class":re.compile('bold .*Font')})
for row in txt:
  L.append(row.text)
print(L)
but I am getting an empty list. Any solutions or other suggestions ?

    
        
            Date
            Price
            Open
            High
            Low
            Vol.             Change %
        
    
    
            
            Jul 15, 2016
            98.78
            99.02
            99.30
            98.51
            30.14M             -0.01%
        
                
            Jul 14, 2016
            98.79
            97.39
            98.99
            97.32
            38.92M             1.98%
          




-0.01%
1.98%

alecxe · Accepted Answer

The reason your current approach does not work is that the class is a special multi-valued attribute in BeautifulSoup and a regular expression would not be applied to the complete attribute, but to individual classes instead, this thread should explain it in more detail:

BeautifulSoup returns empty list when searching by compound class names

You can actually avoid checking class values and, instead, just grab the td elements having % at the end of the text:

table = parsed_html.find("table", {"id":"curr_table"})
for td in table.find_all("td", text=lambda text: text and text.endswith('%')):
    print(td.get_text())

I would actually use pandas to parse this well-formatted table into the dataframe, which is quite convenient to work with. pandas provides an extensive documentation to help you understand how to work with a dataframe:

import pandas as pd

data = """
 
    
        
            Date
            Price
            Open
            High
            Low
            Vol.             Change %
        
    
    
            
            Jul 15, 2016
            98.78
            99.02
            99.30
            98.51
            30.14M             -0.01%
        
                
            Jul 14, 2016
            98.79
            97.39
            98.99
            97.32
            38.92M             1.98%
         
    

"""

df = pd.read_html(data)[0]
print(df)

print("----")
print(df['Change %'].tolist())

Prints:

           Date  Price   Open   High    Low    Vol. Change %
0  Jul 15, 2016  98.78  99.02  99.30  98.51  30.14M   -0.01%
1  Jul 14, 2016  98.79  97.39  98.99  97.32  38.92M    1.98%
----
['-0.01%', '1.98%']

Parsing inner td tag with Beautifulsoop4 in python

Answers (1)

Related Questions