Tmiskiewicz
Tmiskiewicz

Reputation: 403

Extracting data from web page to CSV file, only last row saved

I'm faced with the following challenge: I want to get all financial data about companies and I wrote a code that does it and let's say that the result is like below:

Unnamed: 0    I Q 2017   II Q 2017  \
0     Przychody netto ze sprzedaży (tys. zł)         137         134   
1   Zysk (strata) z działal. oper. (tys. zł)        -423        -358   
2             Zysk (strata) brutto (tys. zł)        -501        -280   
3             Zysk (strata) netto (tys. zł)*        -399        -263   
4                      Amortyzacja (tys. zł)         134         110   
5                           EBITDA (tys. zł)        -289        -248   
6                           Aktywa (tys. zł)      27 845      26 530   
7                  Kapitał własny (tys. zł)*      22 852      22 589   
8                   Liczba akcji (tys. szt.)  13 921,975  13 921,975   
9                         Zysk na akcję (zł)       -0029       -0019   
10            Wartość księgowa na akcję (zł)        1641        1623   
11             Raport zbadany przez audytora           N           N

but 464 times more.

Unfortunately when I want to save all 464 results in one CSV file I can save only one last result. Not all 464 results, just one... Could you help me save all? Below is my code.

import requests
from bs4 import BeautifulSoup
import pandas as pd    

url = 'https://www.bankier.pl/gielda/notowania/akcje'
page = requests.get(url)

soup = BeautifulSoup(page.content,'lxml')
# Find the second table on the page
t = soup.find_all('table')[0]


#Read the table into a Pandas DataFrame
df = pd.read_html(str(t))[0]

#get 
names_of_company = df["Walor AD"].values

links_to_financial_date = []
#all linkt with the names of companies
links = []

for i in range(len(names_of_company)):
    new_string = 'https://www.bankier.pl/gielda/notowania/akcje/' + names_of_company[i] + '/wyniki-finansowe'
    links.append(new_string)

############################################################################

for i in links:
    url2 = f'https://www.bankier.pl/gielda/notowania/akcje/{names_of_company[0]}/wyniki-finansowe'

    page2 = requests.get(url2)

    soup = BeautifulSoup(page2.content,'lxml')
# Find the second table on the page
    t2 = soup.find_all('table')[0]
    df2 = pd.read_html(str(t2))[0]

    df2.to_csv('output.csv', index=False, header=None)

Upvotes: 1

Views: 434

Answers (1)

Troy D
Troy D

Reputation: 2245

You've almost got it. You're just overwriting your CSV each time. Replace

df2.to_csv('output.csv', index=False, header=None)

with

with open('output.csv', 'a') as f:
    df2.to_csv(f, header=False)

in order to append to the CSV instead of overwriting it.

Also, your example doesn't work because this:

for i in links:
    url2 = f'https://www.bankier.pl/gielda/notowania/akcje/{names_of_company[0]}/wyniki-finansowe'

should be:

for i in links:
    url2 = i

When the website has no data, skip and move on to the next one:

    try:
        t2 = soup.find_all('table')[0]
        df2 = pd.read_html(str(t2))[0]

        with open('output.csv', 'a') as f:
            df2.to_csv(f, header=False)
    except:
        pass

Upvotes: 1

Related Questions