Reputation: 1794
I have a file that contain some lines like this:
StatsLearning_Lect1_2a_111213_v2_%5B2wLfFB_6SKI%5D_%5Btag22%5D.mp4
Respect to this lines, i have some files on disk, but saved on decoded form:
StatsLearning_Lect1_2a_111213_v2_[2wLfFB_6SKI]_[tag22].mp4
I need get file name from first file list and correct file name from second file and change file name to second name. For this goal, i need decode html entity from file name, so i do somthing like this:
import os
from html.parser import HTMLParser
fpListDwn = open('listDwn', 'r')
for lineNumberOnList, fileName in enumerate(fpListDwn):
print(HTMLParser().unescape(fileName))
but this action doesn't have any effect on run, some run's result is:
meysampg@freedom:~/Downloads/Practical Machine Learning$ python3 changeName.py
StatsLearning_Lect1_2a_111213_v2_%5B2wLfFB_6SKI%5D_%5Btag22%5D.mp4
StatsLearning_Lect1_2b_111213_v2_%5BLvaTokhYnDw%5D_%5Btag22%5D.mp4
StatsLearning_Lect3_4a_110613_%5BWjyuiK5taS8%5D_%5Btag22%5D.mp4
StatsLearning_Lect3_4b_110613_%5BUvxHOkYQl8g%5D_%5Btag22%5D.mp4
StatsLearning_Lect3_4c_110613_%5BVusKAosxxyk%5D_%5Btag22%5D.mp4
How i can fix this?
Upvotes: 1
Views: 81
Reputation: 222
I guess you should use urllib.parse instead of html.parser
>>> f="StatsLearning_Lect1_2a_111213_v2_%5B2wLfFB_6SKI%5D_%5Btag22%5D.mp4"
>>> import urllib.parse as parse
>>> f
'StatsLearning_Lect1_2a_111213_v2_%5B2wLfFB_6SKI%5D_%5Btag22%5D.mp4'
>>> parse.unquote(f)
'StatsLearning_Lect1_2a_111213_v2_[2wLfFB_6SKI]_[tag22].mp4'
So your script should look like:
import os
import urllib.parse as parse
fpListDwn = open('listDwn', 'r')
for lineNumberOnList, fileName in enumerate(fpListDwn):
print(parse.unquote(fileName))
Upvotes: 2
Reputation: 36545
This is actually "percent encoding", not HTML encoding, see this question:
How to percent-encode URL parameters in Python?
Basically you want to use urllib.parse.unquote
instead:
from urllib.parse import unquote
unquote('StatsLearning_Lect1_2a_111213_v2_%5B2wLfFB_6SKI%5D_%5Btag22%5D.mp4')
Out[192]: 'StatsLearning_Lect1_2a_111213_v2_[2wLfFB_6SKI]_[tag22].mp4'
Upvotes: 1