Reputation: 352
I'm developing a script which gathers some YouTube data. The script of course creates a pandas
dataframe which is later exported to Excel. I'm experiencing two major issues which somehow seem to be related to each other.
So, Excel 365 allows users to insert an image to a cell using IMAGE()
formula (https://support.microsoft.com/en-au/office/image-function-7e112975-5e52-4f2a-b9da-1d913d51f5d5). Script extracts YouTube thumbnail link to a video and saves it to a defaultdict(list)
dictionary. Next and in parallel, the IMAGE()
formula string is created. After saving the df
to .xlsx
by a dedicated ExcelWriter
(as recommended here: https://stackoverflow.com/a/58062606/11485896) my formulas are always followed by =@
no matter which name and settings - English or local - I use. It's strange because xlsxwriter
requires English names: https://xlsxwriter.readthedocs.io/working_with_formulas.html).
Code (some parts are deleted for better readability):
if export_by_xlsxwriter:
# English formula name - recommended by xlsxwriter guide
channel_videos_data_dict["thumbnailHyperlink_en"].append(
fr'=IMAGE("{thumbnail_url}",,1)')
# local formula name
# note: in my local language formula arguments are splitted by ";" - not ","
# interestingly, using ";" makes workbook corrupted
channel_videos_data_dict["thumbnailHyperlink_locale"].append(
fr'=OBRAZ("{thumbnail_url}",,1)')
writer: pd.ExcelWriter = pd.ExcelWriter("data.xlsx", engine = "xlsxwriter")
df.to_excel(writer)
writer.save()
writer.close()
I managed to save this df
to .csv
. Formulas now work fine (written in local language!) but I lose all the implicit formatting (Excel automatically converts urls to hyperlinks etc.), encoding is crashed and some videos IDs which are followed by -
are mistakenly considered as formulas (ironically). Code:
df.to_csv("data.csv", encoding = "utf-8", sep = ";")
I thought I can at least deal with encoding issues:
df.to_csv("data.csv", encoding = "windows-1250", sep = ";")
...but I get this error:
# ironically again, this is "loudly crying face" emoji ðŸ˜
UnicodeEncodeError:
'charmap' codec can't encode character '\U0001f62d' in position 305: character maps to <undefined>
Thus, my questions are:
df
using xlsxwriter
with formulas preserved and working? (get rid of @
in short)df
to .csv
with proper encoding and videos IDs starting with -
treated as text and text only?Upvotes: 0
Views: 382
Reputation: 41584
The Implicit Intersection Operator @
in a formula usually means that an array formula is returning a scalar value (see the XlsxWriter docs) although it can sometimes indicate an unknown formula.
In your case the =IMAGE()
function is relatively new to Excel and as such it is classified as "future function" (see the XlsxWriter docs section on Formulas added in Excel 2010 and later).
As a result you will need to prefix it with _xlfn.
.
import xlsxwriter
workbook = xlsxwriter.Workbook("image.xlsx")
worksheet = workbook.add_worksheet()
# Make the cell bigger for clarity.
worksheet.set_row(0, 80)
worksheet.set_column(0, 0, 14)
# Insert an image via a formula.
worksheet.write(0, 0, '=_xlfn.IMAGE("https://support.content.office.net/en-us/media/35aecc53-b3c1-4895-8a7d-554716941806.jpg")')
workbook.close()
This will fix the @
issue but the formula may still not work. I get this the first time I try to load the file:
However, once I allow/trust the image/url it loads as expected:
You may get different results depending on your security setting and/or OS.
Upvotes: 5