Tsingis
Tsingis

Reputation: 525

Turn special characters into ascii-like characters or someting else without losing readability

Trying to format data from ics calendar file to any outpu such as json or even python print(). Looking for good ways to replace special characters without losing readability and having ascii-like characters. Examples below. Any tips?

Summary field value in ics file

FORMULA 1 HEINEKEN GRANDE PRÉMIO DE PORTUGAL 2021 - Race
FORMULA 1 MYWORLD GROSSER PREIS VON ÖSTERREICH 2021 - Race

Summary key value in json file

FORMULA 1 HEINEKEN GRANDE PR\u00c3\u0089MIO DE PORTUGAL 2021 - Race
FORMULA 1 MYWORLD GROSSER PREIS VON \u00c3\u0096STERREICH 2021 - Race

Sample code to reproduce problem

import requests
import json
from icalendar import Calendar

## LOGIC HERE ##
def format_text(text):
    text = str(text)
    return text


url = "http://www.formula1.com/calendar/Formula_1_Official_Calendar.ics"
res = requests.get(url)
calendar = Calendar.from_ical(res.text)
events = [
    {
        "id": event["UID"].split("@")[-1].strip(),
        "startTime": event["DTSTART"].dt.strftime("%Y-%m-%dT%H:%M:%S.%f")[:-3],
        "summary": format_text(event["SUMMARY"])
    } for event in calendar.walk("VEVENT") if str(event["UID"]).split("@")[0].startswith("Race")]


with open("events.json", "w") as f:
    json.dump(events, f, indent=2)

Upvotes: 0

Views: 277

Answers (2)

Mark Tolonen
Mark Tolonen

Reputation: 177971

The data for the .ics file should not be decoded, but passed directly to .from_ical. Use res.content instead. Then Calendar generates the data decoded correctly as UTF-8 (probably part of the .ICS spec) and print can print Unicode strings correctly. For the JSON, write with utf8 encoding and ensure_ascii=False as @JosefZ recommended to see it correctly as well:

import requests
import json
from icalendar import Calendar

url = 'http://www.formula1.com/calendar/Formula_1_Official_Calendar.ics'
res = requests.get(url)
calendar = Calendar.from_ical(res.content)
events = [
    {
        'id': event['UID'].split('@')[-1].strip(),
        'startTime': event['DTSTART'].dt.strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3],
        'summary': event['SUMMARY']
    } for event in calendar.walk('VEVENT') if str(event['UID']).split('@')[0].startswith('Race')]

for event in events:
    print(event['summary'])

with open('events.json', 'w', encoding='utf8') as f:
    json.dump(events, f, ensure_ascii=False, indent=2)

print Output:

FORMULA 1 GULF AIR BAHRAIN GRAND PRIX 2021 - Race
FORMULA 1 PIRELLI GRAN PREMIO DEL MADE IN ITALY E DELL'EMILIA ROMAGNA 2021 - Race
FORMULA 1 HEINEKEN GRANDE PRÉMIO DE PORTUGAL 2021 - Race
FORMULA 1 ARAMCO GRAN PREMIO DE ESPAÑA 2021 - Race
FORMULA 1 GRAND PRIX DE MONACO 2021 - Race
FORMULA 1 AZERBAIJAN GRAND PRIX 2021 - Race
FORMULA 1 HEINEKEN GRAND PRIX DU CANADA 2021 - Race
FORMULA 1 EMIRATES GRAND PRIX DE FRANCE 2021 - Race
FORMULA 1 MYWORLD GROSSER PREIS VON ÖSTERREICH 2021 - Race
FORMULA 1 PIRELLI BRITISH GRAND PRIX 2021 - Race
FORMULA 1 MAGYAR NAGYDÍJ 2021 - Race
FORMULA 1 ROLEX BELGIAN GRAND PRIX 2021 - Race
FORMULA 1 HEINEKEN DUTCH GRAND PRIX 2021 - Race
FORMULA 1 HEINEKEN GRAN PREMIO D’ITALIA 2021 - Race
FORMULA 1 VTB RUSSIAN GRAND PRIX 2021 - Race
FORMULA 1 SINGAPORE AIRLINES SINGAPORE GRAND PRIX 2021 - Race
FORMULA 1 JAPANESE GRAND PRIX 2021 - Race
FORMULA 1 ARAMCO UNITED STATES GRAND PRIX 2021 - Race
FORMULA 1 GRAN PREMIO DE LA CIUDAD DE MÉXICO 2021 - Race
FORMULA 1 HEINEKEN GRANDE PRÊMIO DE SÃO PAULO 2021 - Race
FORMULA 1 ROLEX AUSTRALIAN GRAND PRIX 2021 - Race
FORMULA 1 SAUDI ARABIAN GRAND PRIX 2021 - Race
FORMULA 1 ETIHAD AIRWAYS ABU DHABI GRAND PRIX 2021 - Race

events.json:

[
  {
    "id": "1064",
    "startTime": "2021-03-28T16:00:00.000",
    "summary": "FORMULA 1 GULF AIR BAHRAIN GRAND PRIX 2021 - Race"
  },
  {
    "id": "1065",
    "startTime": "2021-04-18T14:00:00.000",
    "summary": "FORMULA 1 PIRELLI GRAN PREMIO DEL MADE IN ITALY E DELL'EMILIA ROMAGNA 2021 - Race"
  },
  {
    "id": "1066",
    "startTime": "2021-05-02T15:00:00.000",
    "summary": "FORMULA 1 HEINEKEN GRANDE PRÉMIO DE PORTUGAL 2021 - Race"
  },
  {
    "id": "1086",
    "startTime": "2021-05-09T14:00:00.000",
    "summary": "FORMULA 1 ARAMCO GRAN PREMIO DE ESPAÑA 2021 - Race"
  },
  {
    "id": "1067",
    "startTime": "2021-05-23T14:00:00.000",
    "summary": "FORMULA 1 GRAND PRIX DE MONACO 2021 - Race"
  },
  {
    "id": "1068",
    "startTime": "2021-06-06T13:00:00.000",
    "summary": "FORMULA 1 AZERBAIJAN GRAND PRIX 2021 - Race"
  },
  {
    "id": "1069",
    "startTime": "2021-06-13T19:00:00.000",
    "summary": "FORMULA 1 HEINEKEN GRAND PRIX DU CANADA 2021 - Race"
  },
  {
    "id": "1070",
    "startTime": "2021-06-27T14:00:00.000",
    "summary": "FORMULA 1 EMIRATES GRAND PRIX DE FRANCE 2021 - Race"
  },
  {
    "id": "1071",
    "startTime": "2021-07-04T14:00:00.000",
    "summary": "FORMULA 1 MYWORLD GROSSER PREIS VON ÖSTERREICH 2021 - Race"
  },
  {
    "id": "1072",
    "startTime": "2021-07-18T15:00:00.000",
    "summary": "FORMULA 1 PIRELLI BRITISH GRAND PRIX 2021 - Race"
  },
  {
    "id": "1073",
    "startTime": "2021-08-01T14:00:00.000",
    "summary": "FORMULA 1 MAGYAR NAGYDÍJ 2021 - Race"
  },
  {
    "id": "1074",
    "startTime": "2021-08-29T14:00:00.000",
    "summary": "FORMULA 1 ROLEX BELGIAN GRAND PRIX 2021 - Race"
  },
  {
    "id": "1075",
    "startTime": "2021-09-05T14:00:00.000",
    "summary": "FORMULA 1 HEINEKEN DUTCH GRAND PRIX 2021 - Race"
  },
  {
    "id": "1076",
    "startTime": "2021-09-12T14:00:00.000",
    "summary": "FORMULA 1 HEINEKEN GRAN PREMIO D’ITALIA 2021 - Race"
  },
  {
    "id": "1077",
    "startTime": "2021-09-26T13:00:00.000",
    "summary": "FORMULA 1 VTB RUSSIAN GRAND PRIX 2021 - Race"
  },
  {
    "id": "1078",
    "startTime": "2021-10-03T13:00:00.000",
    "summary": "FORMULA 1 SINGAPORE AIRLINES SINGAPORE GRAND PRIX 2021 - Race"
  },
  {
    "id": "1079",
    "startTime": "2021-10-10T06:00:00.000",
    "summary": "FORMULA 1 JAPANESE GRAND PRIX 2021 - Race"
  },
  {
    "id": "1080",
    "startTime": "2021-10-24T20:00:00.000",
    "summary": "FORMULA 1 ARAMCO UNITED STATES GRAND PRIX 2021 - Race"
  },
  {
    "id": "1081",
    "startTime": "2021-10-31T19:00:00.000",
    "summary": "FORMULA 1 GRAN PREMIO DE LA CIUDAD DE MÉXICO 2021 - Race"
  },
  {
    "id": "1082",
    "startTime": "2021-11-07T17:00:00.000",
    "summary": "FORMULA 1 HEINEKEN GRANDE PRÊMIO DE SÃO PAULO 2021 - Race"
  },
  {
    "id": "1083",
    "startTime": "2021-11-21T06:00:00.000",
    "summary": "FORMULA 1 ROLEX AUSTRALIAN GRAND PRIX 2021 - Race"
  },
  {
    "id": "1085",
    "startTime": "2021-12-05T16:00:00.000",
    "summary": "FORMULA 1 SAUDI ARABIAN GRAND PRIX 2021 - Race"
  },
  {
    "id": "1084",
    "startTime": "2021-12-12T13:00:00.000",
    "summary": "FORMULA 1 ETIHAD AIRWAYS ABU DHABI GRAND PRIX 2021 - Race"
  }
]

Upvotes: 1

JosefZ
JosefZ

Reputation: 30153

with open("events.json", mode="w", encoding="utf-8") as f:
    json.dump(events, f, indent=2, ensure_ascii=False)

From json.dump docs:

If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is.

Used encoding="utf-8" in open as the default encoding is platform dependent (whatever locale.getpreferredencoding() returns).

Upvotes: 1

Related Questions