alekoj
alekoj

Reputation: 15

Extracting multiple tables into a .csv

I have a csv file, which I am using to search uniprot.org for multiple variants of a protein, an example of this is the following website:

https://www.uniprot.org/uniprot/?query=KU168294+env&sort=score


import pandas as pd
import requests
from bs4 import BeautifulSoup
import csv


df = pd.read_csv('Env_seq_list.csv')


second_column_df = df['Accession']


for row in second_column_df:


        theurl = 'https://www.uniprot.org/uniprot/?query=' + row + '+env&sort=score'
        page = requests.get(theurl).content

        df_list = pd.read_html(page)
        df = df_list[-1]

        print(df.loc[df['Gene names'] == 'env'])

If I perform the print function, it works fine and I get back a list of the tables that I'm after. I'm stuck at this point because if I instead use the pandas df.to_csv function I cannot seem to get it to work alongside the df.loc function. Additionally, simply using the df.to_csv function only writes the last search result to the .csv, which I'm pretty sure is due to that function being within the for loop, however I am unsure as to how to fix this. Any help would be greatly appreciated :-)

Upvotes: 1

Views: 80

Answers (1)

Max Power
Max Power

Reputation: 8954

I would suggest that you take the df you find each time through the loop, and append it to a 'final' df. Then outside the loop, you can run to_csv on that 'final' df. Code below:

final_df = pd.DataFrame()
for row in second_column_df:


    theurl = 'https://www.uniprot.org/uniprot/?query=' + row + '+env&sort=score'
    page = requests.get(theurl).content

    df_list = pd.read_html(page)
    df = df_list[-1]

    #print(df.loc[df['Gene names'] == 'env'])
    final_df = pd.concat([final_df, df.loc[df['Gene names'] == 'env']], axis=0)

final_df.to_csv('/path/to/save/csv')

Upvotes: 2

Related Questions