Reputation: 99
Im making an API request to pull some articles bodies and ids. I would like to pull them into an excel file where on the first column the id's are displayed and on the next column the corresponding bodies are displayed.
I managed to bring it this far but for some reason the formatting in excel is not right. There is a lot of data that I'm trying to pull here but I would want to display each one of them in it's separate cell.
Can I specifically ask my script to transpose the data automatically?
import requests
import csv
import unicodedata
import getpass
import xlsxwriter
url = 'https://.../articles.json'
workbook = xlsxwriter.Workbook('bodies.xlsx')
worksheet = workbook.add_worksheet()
output_1 = []
output_2 = []
listOf = ([output_1],[output_2])
while url:
user = '[email protected]'
pwd = '1234'
response = requests.get(url, auth=(user, pwd))
data = response.json()
for article in data['articles']:
article_id = article['id']
body = article['body']
decode_1 = int(article_id)
decode_2 = unicodedata.normalize('NFKD', body)
output_1.append(decode_1)
output_1.append(decode_2)
print(data['next_page'])
url = 'https://.../articles.json' and data['next_page']
row = 0
col = 0
for output_1, output_2 in (listOf):
worksheet.write_row(row, col, output_1)
worksheet.write_column(col, col + 1, output_2)
col += 1
workbook.close()
Update:
So basically in my script I'm reading a json that looks something like this
"per_page": 30,
"previous_page": null,
"articles": [
{
"id": 360239848018,
"url": "https://.../articles/360239848018.json",
"html_url": "https://...",
"author_id": 5201232,
"body": "<div class....a lot of html" }
Upvotes: 1
Views: 911
Reputation: 3927
It looks like all the data that you are scraping with the code you've shared ends up in output_1
, while output_2
is empty.
Since you'd like to have the article_id
in one column and the content in another column, I would suggest to store the content in output_2
instead of output_1
.
Apart from that, you are using write_row()
on output_1
. As per documentation (emphasis mine):
Write a row of data starting from (row, col).
But it sounds like you'd like to write it as a column.
Another thing to keep in mind is that your listOf
is a tuple containing two lists. Iterating it won't get you far.
With all of the above said, this is what should work:
import csv
import requests
import unicodedata
import xlsxwriter
url = 'https://.../articles.json'
output_1 = []
output_2 = []
while url:
user = '[email protected]'
pwd = '1234'
response = requests.get(url, auth=(user, pwd))
data = response.json()
for article in data['articles']:
article_id = article['id']
body = article['body']
decode_1 = int(article_id)
decode_2 = unicodedata.normalize('NFKD', body)
output_1.append(decode_1)
output_2.append(decode_2)
# ^^^^^^^^
# This is output_2 now.
print(data['next_page'])
url = 'https://.../articles.json' and data['next_page']
workbook = xlsxwriter.Workbook('bodies.xlsx')
worksheet = workbook.add_worksheet()
row = 0
col = 0
worksheet.write_column(row, col, output_1)
# ^^^^^^^^^^^^
# use write_column instead of write_row
worksheet.write_column(row, col + 1, output_2)
workbook.close()
Upvotes: 1