Europa
Europa

Reputation: 1284

Python read csv file and strip spaces from it

I've created a script that reads a csv file. It looks ok when I run it in Pycharm, however when I mark the output text and click CTRL+C and paste it into Notepad then I get spaces between each letter.

For example when I have the file in Excel then I get this:

30.11.2020 09:03    Torbj%C3%B8rn+%3CTorbj%C3%B8rn%3E   SPF+%3CSeksjon+for+Passord+og+Forebygging%3E    Vennligst+endre+passordet+mitt+til+PST%7Bfacb0950fb7a5c537cf7fa68b8894027%7D

When I print copy it from Pycharm output I get this:

2 0 2 0 - 1 1 - 3 0   0 9 : 0 3 : 5 1    T o r b j % C 3 % B 8 r n   % 3 C T o r b j % C 3 % B 8 r n % 3 E       S P F   % 3 C S e k s j o n   f o r   P a s s o r d   o g   F o r e b y g g i n g % 3 E         V e n n l i g s t   e n d r e   p a s s o r d e t   m i t t   t i l   P S T % 7 B f a c b 0 9 5 0 f b 7 a 5 c 5 3 7 c f 7 f a 6 8 b 8 8 9 4 0 2 7 % 7 D 

How can I remove the white spaces?

Ive tried to use line = line.strip() with no luck.

My script:

class Day05:
    print('')
    print('~~~~~~~~~~~~~~~~~~~~~~~~ Day 05 ~~~~~~~~~~~~~~~~~~~~~~~~')
    print('')

    def printDataInLogFile():
        # Header
        print("Datetime\t", end='')
        print("Name\t", end='')
        print("Section\t", end='')
        print("Message")

        # Read and loop line by line
        file1 = open('./log.csv', 'r')
        lines = file1.readlines()
        for line in lines:
            line = line.strip()
            line = line.replace('+', ' ')
            line = line.replace('%C3%A6', 'æ')
            line = line.replace('%C3%B8', 'ø')
            line = line.replace('%C3%A5', 'å')
            line = line.replace('%7B', '{')
            line = line.replace('%7D', '}')
            date = ""
            name = ""
            section = ""
            message = ""

            for i, d in enumerate(line.split(";")):
                if(i == 0):
                    date = d
                elif(i == 1):
                    name = d
                elif(i == 2):
                    section = d
                elif(i == 3):
                    message = d

            # Body
            if(name != ""):
                print(str(date) + "\t", end='')
                print(str(name) + "\t\t", end='')
                print(str(section) + "\t\t", end='')
                print(str(message))


    """ Script start """
    printDataInLogFile()

Some line with content of log.csv:

2020-10-01 07:00:04;Lisbeth+%3CLisbeth%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;I+dag+har+jeg+lyst+til+at+PST%7Bb53250c991675c7b0c712e9bdc2c1216%7D+skal+v%C3%A6re+passordet+mitt
2020-10-01 07:02:22;Unni+%3CUnni%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Vennligst+endre+passordet+mitt+til+PST%7B5cdadc1037fa416f7d79186adc55f1ff%7D
2020-10-01 07:03:11;Jan+%3CJan%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;I+dag+har+jeg+lyst+til+at+PST%7B1241512147283b40bfe8e2eac36ac2dd%7D+skal+v%C3%A6re+passordet+mitt
2020-10-01 07:04:26;Maria+%3CMaria%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Vennligst+endre+passordet+mitt+til+PST%7Bca1d9d8d4243c374cb14faa8363bc0dc%7D
2020-10-01 07:06:52;Mellomleder+%3CMellomleder%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Bytt+til+PST%7B99e12ae9d06336a7d9c644641388450a%7D
2020-10-01 07:09:00;Robert+%3CRobert%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;I+dag+har+jeg+lyst+til+at+PST%7Bda52537925c86ac5d5352edd78e10350%7D+skal+v%C3%A6re+passordet+mitt
2020-10-01 07:11:13;H%C3%A5kon+%3CH%C3%A5kon%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Vennligst+endre+passordet+mitt+til+PST%7B2a6fa4d619a88882dbcf1df5dff8ff65%7D
2020-10-01 07:11:56;Terje+%3CTerje%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Jeg+%C3%B8nsker+%C3%A5+endre+passord+til+PST%7B4970a0cdd3f0eb19e9ec1d7423f26de8%7D
2020-10-01 07:14:33;Anette+%3CAnette%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;I+dag+har+jeg+lyst+til+at+PST%7B1b956ee14848acccdc150db512b2084d%7D+skal+v%C3%A6re+passordet+mitt
2020-10-01 07:14:51;Daniel+%3CDaniel%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Bytt+til+PST%7B80f7c07f7d06bbcd38f3af5c90afe866%7D
2020-10-01 07:15:29;Systemeier+%3CSystemeier%3E;SPF+%3CSeksjon+for+Passord+og+Forebygging%3E;Bytt+til+PST%7Be905beda4ccdfaf8c7b3388d057e37c4%7D

Upvotes: 1

Views: 948

Answers (3)

Tomalak
Tomalak

Reputation: 338238

I have the file in Excel then I get this:

30.11.2020 09:03

When I print copy it from Pycharm output I get this:

2 0 2 0 - 1 1 - 3 0   0 9

You've saved the file as Unicode in Excel, but you are not reading the file as Unicode in Python.

# Read and loop line by line
with open('./log.csv', 'r', encoding='utf-16-le') as file1:
    for line in file1:
        print(line)

Notes

  • Use context managers to open files (with open(...) as f:) instead of naked open() calls.
  • Always open text files with an explicitly specified encoding. If you don't know the encoding, you need to find out. Trusting in defaults does not work here.
  • Use the csv module to read CSV files.
  • Use the urllib module to decode URL-encoded values, instead of trying to do manual string replacements.

E.g. (for a single input that represents the "value" part in a key=value pair):

from urllib.parse import parse_qs

raw_value = "Torbj%C3%B8rn+%3CTorbj%C3%B8rn%3E"
parsed_value = parse_qs(f"temp={raw_value}")            # -> {'temp': ['Torbjørn <Torbjørn>']}
actual_value = parsed_value['temp'][0]                  # -> 'Torbjørn <Torbjørn>'

can be turned into a function

def decode_url_value(raw_value):
    parsed_value = parse_qs(f"temp={raw_value}")
    return parsed_value['temp'][0]

decode_url_value("Torbj%C3%B8rn+%3CTorbj%C3%B8rn%3E")   # -> 'Torbjørn <Torbjørn>'

Upvotes: 2

OctaveL
OctaveL

Reputation: 1045

If you use the libs unidecode and urllib, you can easily do this:

from unidecode import unidecode
from urllib.parse import unquote

...
file1 = open('./log.csv', 'r')
lines = file1.readlines()
for line in lines:
   line = unidecode(unquote(line))
   line = line.strip()
   line = line.replace('+', ' ')
   # line = line.replace('%C3%A6', 'æ')
   # line = line.replace('%C3%B8', 'ø')
   # line = line.replace('%C3%A5', 'å')
   # line = line.replace('%7B', '{')
   # line = line.replace('%7D', '}')
...

You'd no longer need to manually replace special characters yourself.

Upvotes: 1

itaisls9
itaisls9

Reputation: 60

str.strip() only removes leading and ending spaces, in order to remove all space characters, use str.replace(" ", "")

Upvotes: -1

Related Questions