Reputation: 6525
Here I am encoding the data
post = """
='Brand New News Fr0m The Timber Industry!!'=
========Latest Profile==========
Energy & Asset Technology, Inc. (EGTY)
Current Price $0.15
================================
Recognize this undiscovered gem which is poised to jump!!
Please read the following Announcement in its Entierty and
Consider the Possibilities�
Watch this One to Trad,e!
Because, EGTY has secured the global rights to market
genetically enhanced fast growing, hard-wood trees!
EGTY trading volume is beginning to surge with landslide Announcement.
The value of this Stoc,k appears poised for growth! This one will not
remain on the ground floor for long.
KEEP READING!!!!!!!!!!!!!!!
===============
"BREAKING NEWS"
===============
-Energy and Asset Technology, Inc. (EGTY) owns a global license to market
the genetically enhanced Global Cedar growth trees, with plans to
REVOLUTIONIZE the forest-timber industry.
These newly enhanced Globa| Cedar trees require only 9-12 years of growth
before they can be harvested for lumber, whereas worldwide growth time for
lumber is 30-50 years.
Other than growing at an astonishing rate, the Global Cedar has a number
of other benefits. Its natural elements make it resistant to termites, and
the lack of oils and sap found in the wood make it resistant to forest fire,
ensuring higher returns on investments.
T
he wood is very lightweight and strong, lighter than Poplar and over twice
as strong as Balsa, which makes it great for construction. It also has
the unique ability to regrow itself from the stump, minimizing the land and
time to replant and develop new root systems.
Based on current resources and agreements, EGTY projects revenues of $140
Million with an approximate profit margin of 40% for each 9-year cycle. With
anticipated growth, EGTY is expected to challenge Deltic Timber Corp. during
its initial 9-year cycle.
Deltic Timber Corp. currently trades at over $38.00 a share with about $153
Million in revenues. As the reputation and demand for the Global Cedar tree
continues to grow around the world EGTY believes additional multi-million
dollar agreements will be forthcoming. The Global Cedar nursery has produced
about 100,000 infant plants and is developing a production growth target of
250,000 infant plants per month.
Energy and Asset Technology is currently in negotiations with land and business
owners in New Zealand, Greece and Malaysia regarding the purchase of their popular
and profitable fast growing infant tree plants. Inquiries from the governments of
Brazil and Ecuador are also being evaluated.
Conclusion:
The examples above show the Awesome, Earning Potential of little
known Companies That Explode onto Investor�s Radar Screens.
This s-t0ck will not be a Secret for long. Then You May Feel the Desire to Act Right
Now! And Please Watch This One Trade!!
GO EGTY!
All statements made are our express opinion only and should be treated as such.
We may own, take position and sell any securities mentioned at any time. Any
statements that express or involve discussions with respect to predictions,
goals, expectations, beliefs, plans, projections, object'ives, assumptions or
future events or perfo'rmance are not
statements of historical fact and may be
"forward,|ooking statements." forward,|ooking statements are based on expectations,
estimates and projections at the time the statements are made that involve a number
of risks and uncertainties which could cause actual results or events to differ
materially from those presently anticipated. This newsletter was paid $3,000 from
third party (IR Marketing). Forward,|ooking statements in this action may be identified
through the use of words such as: "pr0jects", "f0resee", "expects". in compliance with
Se'ction 17. {b), we disclose the holding of EGTY shares prior to the publication of
this report. Be aware of an inherent conflict of interest resulting from such holdings
due to our intent to profit from the liquidation of these shares. Shar,es may be sold
at any time, even after positive statements have been made regarding the above company.
Since we own shares, there is an inherent conflict of interest in our statements and
opinions. Readers of this publication are cautioned not
to place undue reliance on
forward,|ooking statements, which are based on certain assumptions and expectations
involving various risks and uncertainties that could cause results to differ materially
from those set forth in the forward- looking statements. This is not solicitation to
buy or sell st-0cks, this text is or informational purpose only and you should seek
professional advice from registered financial advisor before you do anything related
with buying or selling st0ck-s, penny st'0cks are very high risk and you can lose your
entire inves,tment.
"""
In [147]: post.encode('utf-8')
and I am getting the output
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 319: ordinal not in range(128)
Upvotes: 2
Views: 25150
Reputation: 89927
First, tell Python what encoding you're using by making this the second line of your file (or first, if you don't use a shebang):
# coding=utf-8
(see PEP 263)
Then, instead of using a byte string, always use unicode literals for textual content:
post = u"""
='Brand New News Fr0m The Timber Industry!!'=
etc. etc. etc."""
Upvotes: 3
Reputation: 11614
Unicode is a table which tries to encompass (all) known letters, characters and signs, often also called glyphs. That's somewhat over 110000 meaning holding signs atm. So the DECODED state is a (code)point in this table. But because a byte can't hold more then 8bits = 256 states you have to ENCODE the unicode representation into a byte-stream. The most used encoding technique is the so called UTF-8 ENCODING, which succeeds the older ASCII ENCODING. The UTF-8 Encoding allows to ENCODE Unicode-glyphs with one to four bytes.
So encoding or decoding is always from unicode or towards unicode. If you want to transform from one encoding to another you have to do it over unicode:
[decode] [encode]
ASCII ---> UNICODE ---> UTF-8
1 Glyph 1 Glyph
= 1 Glyph =
1 Byte 1-4 Bytes
unicode_str = mystring.decode('ascii')
utf8_str = unicode_str.encode('utf-8')
(not the best example, because ASCII ALWAYS fits into utf-8)
So if you want to decode your post
variable, you have to know which encoding has the referred string. In python 2.x it's normally ASCII encoded. In python 3.x it should be UTF-8.
import sys
print sys.getdefaultencoding()
If your post
-variable is not defined in your source-code, but read from an external byte-stream you MUST KNOW the encoding or you will be out of luck.
Upvotes: 3