Amir Shabani
Amir Shabani

Reputation: 4207

Unexpected character when writting to Excel using Pandas

I have a dictionary like this:

film = {
    'ID': [],
    'Name': [],
    'Run Time': [],
    'Genre': [],
    'link': [],
    'name 2': []
}

Then I populate it in a for loop, like this:

film['ID'].append(film_id)
film['Name'].append(film_name)
film['Run Time'].append(film_runtime)
film['Genre'].append(film_genre)
film['link'].append(film_link)
film['name 2'].append(film_name2)

Then I convert the dictionary to a Pandas DataFrame, so that I can write it to an .xlsx file. Now before I actually write it, I print it to check the values of Run Time column. And everything is OK:

output_df = pd.DataFrame(film).set_index('ID')
print(output_df['Run Time'])

output:
ID
102    131
103     60
104       
105       
Name: Run Time, dtype: object

But then, when I write it, like this:

writer = ExcelWriter('output.xlsx')
output_df.to_excel(writer, 'فیلم')
writer.save()

The file looks like this:

weird output

As you can see, there's an extra ' (single quote) character in the file. This character is not visible. But I can highlight it:

highlighted

And if I remove it, the number goes RTL:

removed

So I thought the invisible character was LTR MARK (\u200E). I removed it like this:

film['Run Time'].append(film_runtime.replace('\u200E', ''))

But nothing happened, and the character is still there.

How can I fix this?

Upvotes: 0

Views: 461

Answers (2)

quamrana
quamrana

Reputation: 39374

You need to make sure that cells that need to be numbers are converted to numbers (typically ints) before converting to an .xlsx file.

In your case just:

film['Run Time'].append(int(film_runtime))

Upvotes: 1

albeksdurf
albeksdurf

Reputation: 124

The ' before a value in Excel forces the value to string. Looks like the Excel Writer is interpreting such list as an string array. Changing the type in the DataFrame should solve it.

Upvotes: 0

Related Questions