Reputation: 25
I'm a novice in python and I need to extract references from scientific literature. Following is the code I'm using
from refextract import extract_references_from_url
references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
print(references)
So, Please guide me on how to extract this printed information into a Xls file. Thank you so much.
Upvotes: 0
Views: 91
Reputation: 1604
You could use the pandas library to write the references into excel.
from refextract import extract_references_from_url
import pandas as pd
references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
print(references)
# convert to pandas dataframe
dfref = pd.DataFrame(references)
# write dataframe into excel
dfref.to_excel('./refs.xlsx')
Upvotes: 3
Reputation: 160
After going through the documentation of refextract here, I found that your variable references
is a dictionary. For converting such a dictionary to python you can use Pandas as follows-
import pandas as pd
# create a pandas dataframe using a dictionary
df = pd.DataFrame(data=references, index=[0])
# Take transpose of the dataframe
df = (df.T)
# write the dictionary to an excel file
df.to_excel('extracted_references.xlsx')
Upvotes: 1
Reputation: 146
You should have a look at xlsxwriter, a module for creating excel files. Your code could then look like this:
import xlsxwriter
from refextract import extract_references_from_url
workbook = xlsxwriter.Workbook('References.xlsx')
worksheet = workbook.add_worksheet()
references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
row = 0
col = 0
worksheet.write(references)
workbook.close
(modified based upon https://xlsxwriter.readthedocs.io/tutorial01.html)
Upvotes: 1