Phil
Phil

Reputation: 993

Notepad++ convert to UTF-8 multiple files

The function "Convert to UTF-8 without BOM" of Notepad++ is really nice. But I have 200 files and all of them need to be coverted. Therefor I found this little python script:

import os;
import sys;
filePathSrc="C:\\Temp\\UTF8"
for root, dirs, files in os.walk(filePathSrc):
    for fn in files:
      if fn[-4:] != '.jar' and fn[-5:] != '.ear' and fn[-4:] != '.gif' and fn[-4:] != '.jpg' and fn[-5:] != '.jpeg' and fn[-4:] != '.xls' and fn[-4:] != '.GIF' and fn[-4:] != '.JPG' and fn[-5:] != '.JPEG' and fn[-4:] != '.XLS' and fn[-4:] != '.PNG' and fn[-4:] != '.png' and fn[-4:] != '.cab' and fn[-4:] != '.CAB' and fn[-4:] != '.ico':
        notepad.open(root + "\\" + fn)
        console.write(root + "\\" + fn + "\r\n")
        notepad.runMenuCommand("Encoding", "Convert to UTF-8 without BOM")
        notepad.save()
        notepad.close()

It goes through every file -> I can see this. But after it finished, the charset is stil ANSI in my case :/

Can anyone help me?

Upvotes: 19

Views: 32235

Answers (4)

Just Me
Just Me

Reputation: 1053

USE NOTEPAD++ Python SCript Plugin. Copy this code into a NEW SCRIPT:

# -*- coding: utf-8 -*-
from __future__ import print_function

from Npp import notepad
import os

uft8_bom = bytearray(b'\xEF\xBB\xBF')
top_level_dir = notepad.prompt('Paste path to top-level folder to process:', '', '')
if top_level_dir != None and len(top_level_dir) > 0:
    if not os.path.isdir(top_level_dir):
        print('bad input for top-level folder')
    else:
        for (root, dirs, files) in os.walk(top_level_dir):
            for file in files:
                full_path = os.path.join(root, file)
                print(full_path)
                with open(full_path, 'rb') as f: data = f.read()
                if len(data) > 0:
                    if ord(data[0]) != uft8_bom[0]:
                        try:
                            with open(full_path, 'wb') as f: f.write(uft8_bom + data)
                            print('added BOM:', full_path)
                        except IOError:
                            print("can't change - probably read-only?:", full_path)
                    else:
                        print('already has BOM:', full_path)

SECOND SOLUTION IS TO USE REGEX, find and replace:

Find in files:
SEARCH: \A
REPLACE BY: \x{FEFF} FILTERS *.html (you have to give Ok from the first, don't cancel)

Upvotes: 0

klaus
klaus

Reputation: 79

You also can record and play back a macro here. Tthat's what worked for me since the PlugIn manager is somehow broken I don't have Python available.

  • drag a set of files (or all - I think there is a limit in the maximum number of files) into notepad++
  • Macro -> Start recording
  • do the conversion
  • save file
  • close file
  • Macro -> Stop recording

You can play back the macro by selecting

  • Macro -> Run a Macro Multiple Times
  • Enter a value such that all files are processed

Since the files are closed after processing, you will know which files have not been processed yet.

Upvotes: 7

Hrvoje
Hrvoje

Reputation: 15152

Here is what worked for me:

Go to Notepad++ -> Plugins -> Plugins Admin.

Find and install Python Script plugin.

Create new python script with Plugins -> Python Script -> New script.

Insert this code into your script:

import os;
import sys;
filePathSrc="C:\\Users\\YourUsername\\Desktop\\txtFolder"
for root, dirs, files in os.walk(filePathSrc):
    for fn in files:
      if fn[-4:] == '.txt' or fn[-4:] == '.csv':
        notepad.open(root + "\\" + fn)
        console.write(root + "\\" + fn + "\r\n")
        notepad.runMenuCommand("Encoding", "Convert to UTF-8")
        notepad.save()
        notepad.close()

Replace C:\\Users\\YourUsername\\Desktop\\txtFolder with path to your Windows folder where your files are.

Script works with .txt and .csv files and ignores all other files in folder.

Run script with Plugins -> Python Scripts -> Scripts -> name of your script

Upvotes: 31

Phil
Phil

Reputation: 993

Got my mistake. My notepad is in german. So take care if it's called "Encoding" or in my case "Kodierung" and "Convert to UTF-8 without BOM" is "Konvertiere zu UTF-8 ohne BOM"

That helped me out!

Upvotes: 9

Related Questions