Vinoj Raj
Vinoj Raj

Reputation: 25

How to Extract the result from python into a xls file

I'm a novice in python and I need to extract references from scientific literature. Following is the code I'm using

from refextract import extract_references_from_url
references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
print(references)

So, Please guide me on how to extract this printed information into a Xls file. Thank you so much.

Upvotes: 0

Views: 91

Answers (3)

drops
drops

Reputation: 1604

You could use the pandas library to write the references into excel.

from refextract import extract_references_from_url
import pandas as pd

references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
print(references)

# convert to pandas dataframe
dfref = pd.DataFrame(references)

# write dataframe into excel
dfref.to_excel('./refs.xlsx')

Upvotes: 3

Hit2Mo
Hit2Mo

Reputation: 160

After going through the documentation of refextract here, I found that your variable references is a dictionary. For converting such a dictionary to python you can use Pandas as follows-

import pandas as pd
# create a pandas dataframe using a dictionary
df = pd.DataFrame(data=references, index=[0])
# Take transpose of the dataframe 
df = (df.T)
# write the dictionary to an excel file
df.to_excel('extracted_references.xlsx')

Upvotes: 1

Kasai kemono
Kasai kemono

Reputation: 146

You should have a look at xlsxwriter, a module for creating excel files. Your code could then look like this:

import xlsxwriter
from refextract import extract_references_from_url
workbook = xlsxwriter.Workbook('References.xlsx')
worksheet = workbook.add_worksheet()

references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')

row = 0
col = 0

worksheet.write(references)

workbook.close

(modified based upon https://xlsxwriter.readthedocs.io/tutorial01.html)

Upvotes: 1

Related Questions