Pandas - how to read a table from clipboard

Question

I am trying to scrape a table from a web page.

                    
                            
                                AT00BUWOG001
                            

                                
                                    P
                                
                                

                                

                            
                                142
                            
                            
                                BUWOG
                            
                            
                                124 184 779
                            
                            
                                16 019,84
                            
                            
                                12 476,29
                            
                            
                                2018-07-31
                            
                            
                                H
                            
                            
                                1,28
                            
                            
                                14,00
                            
                            
                                2,30
                            
                        


                        
                            
                                PLBRSTM00015
                            

                                
                                    P
                                
                                
                                    LA
                                

                            
                                180
                            
                            
                                CALATRAVA
                            
                            
                                15 000 000
                            
                            
                                3,45
                            
                            
                                7,93
                            
                            
                                2017-03-31
                            
                            
                                H
                            
                            
                                0,44
                            
                            
                                0,00
                            
                            
                                0,00

I tried pandas read_clipboard() but the result I'm getting is data from a column ends up in different columns, because there are some empty columns in the table.

           ISIN Market Segment    ...             PBV    PE  Div Yield
0  PLNFI0600010      P      LA    ...      2018-12-31     H       0,14
1  PLNFI0800016      P     141    ...               H  0,55     160,00
2  PL11BTS00015      P     650    ...               J  9,44      22,60
3  PL4FNMD00013      P     641    ...               H  1,25       6,80
4  PLABCDT00014      R     612    ...               H  0,94       0,00
5  PLABMSD00015      P     411    ...            0,00  0,00       0,00
6  PLAB00000019      P     612    ...               H  0,39       5,10
7  PLACSA000014      P     541    ...               J  4,20      13,00
8  PLACTIN00018      P     612    ...               H  0,51       0,00
9  PLADVIV00015      P     720    ...               H  2,07       0,00

Can I set some attributes in the read_clipboard() so that a row of data always has the same length like in the HTML? and the data ends up in the right column?

prosti · Accepted Answer

I tried read_html method and added

wrapper manually.

But you may use this:

from BeautifulSoup import BeautifulSoup
html = "..."
soup = BeautifulSoup(html)
print soup.prettify()

Here is what I tried:

html="""
                            
                                AT00BUWOG001
                            

                                
                                    P
                                
                                

                                

                            
                                142
                            
                            
                                BUWOG
                            
                            
                                124 184 779
                            
                            
                                16 019,84
                            
                            
                                12 476,29
                            
                            
                                2018-07-31
                            
                            
                                H
                            
                            
                                1,28
                            
                            
                                14,00
                            
                            
                                2,30
                            
                        


                        
                            
                                PLBRSTM00015
                            

                                
                                    P
                                
                                
                                    LA
                                

                            
                                180
                            
                            
                                CALATRAVA
                            
                            
                                15 000 000
                            
                            
                                3,45
                            
                            
                                7,93
                            
                            
                                2017-03-31
                            
                            
                                H
                            
                            
                                0,44
                            
                            
                                0,00
                            
                            
                                0,00
                            
                        """

df= pd.read_html(html, header=None)[0]
print(df)

The output was:

             0  1    2    3          4            5          6          7   \
0  AT00BUWOG001  P  NaN  142      BUWOG  124 184 779  16 019,84  12 476,29   
1  PLBRSTM00015  P   LA  180  CALATRAVA   15 000 000        345        793   

           8  9    10    11   12  
0  2018-07-31  H  128  1400  230  
1  2017-03-31  H   44     0    0

Pandas - how to read a table from clipboard

Answers (2)

Related Questions