Reputation: 27
I was running this code in python 3.7:
import matplotlib.pylab as plt
LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
def frequency_analysis(plain_text):
#the text we analyise
plain_text = plain_text.upper()
#we use a dictionary to store the letter-frequency pair
letter_frequency = {}
#initialize the dictionary (of course with 0 frequencies)
for letter in LETTERS:
letter_frequency[letter] = 0
#let's consider the text we want to analyse
for letter in plain_text:
#we keep incrementing the occurence of the given letter
if letter in LETTERS:
letter_frequency[letter] += 1
return letter_frequency
def plot_distribution(letter_frequency):
centers = range(len(LETTERS))
plt.xlabel("Letters")
plt.ylabel("Numbers")
plt.bar(centers, letter_frequency.values(), align='center', tick_label=letter_frequency.keys())
plt.xlim([0,len(LETTERS)-1])
plt.show()
if __name__ == "__main__":
plain_text = "Shannon defined the quantity of information produced by a source for example, the quantity in a message by a formula similar to the equation that defines thermodynamic entropy in physics. In its most basic terms, Shannon's informational entropy is the number of binary digits required to encode a message. Today that sounds like a simple, even obvious way to define how much information is in a message. In 1948, at the very dawn of the information age, this digitizing of information of any sort was a revolutionary step. His paper may have been the first to use the word bit, short for binary digit. As well as defining information, Shannon analyzed the ability to send information through a communications channel. He found that a channel had a certain maximum transmission rate that could not be exceeded. Today we call that the bandwidth of the channel. Shannon demonstrated mathematically that even in a noisy channel with a low bandwidth, essentially perfect, error-free communication could be achieved by keeping the transmission rate within the channel's bandwidth and by using error-correcting schemes: the transmission of additional bits that would enable the data to be extracted from the noise-ridden signal. Today everything from modems to music CDs rely on error-correction to function. A major accomplishment of quantum-information scientists has been the development of techniques to correct errors introduced in quantum information and to determine just how much can be done with a noisy quantum communications channel or with entangled quantum bits (qubits) whose entanglement has been partially degraded by noise."
frequencies = frequency_analysis(plain_text)
plot_distribution(frequencies)
I am getting this output: It is having black noise in the x-axis.
This is the output of the same code when I run it on python 2.7:
The black noise does not appear in python 2.7
Is there any solution that can i remove the black noise in python 3.7
Upvotes: 1
Views: 248
Reputation: 339510
It's always a bit dangerous to rely on the order of a dictionary. May I hence suggest the following solution, which is much shorter and would not require a sorted dictionary. It would work with python 2.7 or 3.5 or higher, but requires matplotlib >= 2.2.
from collections import Counter
import matplotlib.pylab as plt
LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
def frequency_analysis(plain_text):
return Counter(plain_text.replace(" ", "").upper())
def plot_distribution(letter_frequency):
plt.xlabel("Letters")
plt.ylabel("Numbers")
plt.bar(list(LETTERS), [letter_frequency[c] for c in LETTERS], align='center')
plt.show()
if __name__ == "__main__":
plain_text = "Shannon defined the quantity of information produced by a source for example, the quantity in a message by a formula similar to the equation that defines thermodynamic entropy in physics. In its most basic terms, Shannon's informational entropy is the number of binary digits required to encode a message. Today that sounds like a simple, even obvious way to define how much information is in a message. In 1948, at the very dawn of the information age, this digitizing of information of any sort was a revolutionary step. His paper may have been the first to use the word bit, short for binary digit. As well as defining information, Shannon analyzed the ability to send information through a communications channel. He found that a channel had a certain maximum transmission rate that could not be exceeded. Today we call that the bandwidth of the channel. Shannon demonstrated mathematically that even in a noisy channel with a low bandwidth, essentially perfect, error-free communication could be achieved by keeping the transmission rate within the channel's bandwidth and by using error-correcting schemes: the transmission of additional bits that would enable the data to be extracted from the noise-ridden signal. Today everything from modems to music CDs rely on error-correction to function. A major accomplishment of quantum-information scientists has been the development of techniques to correct errors introduced in quantum information and to determine just how much can be done with a noisy quantum communications channel or with entangled quantum bits (qubits) whose entanglement has been partially degraded by noise."
frequencies = frequency_analysis(plain_text)
plot_distribution(frequencies)
Upvotes: 0
Reputation: 39072
The problem is in the ticklabels argument. In python 3.6, it is taken as dictionary and hence the labels appear in a weird overlap way. Just convert it to a list
to resolve the problem.
If you print letter_frequency.keys()
in python 3.6
, you get
dict_keys(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'])
If you do the same in python 2.x
, you will get
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
Hence, if you are using python 3.6
, convert the letter_frequency.keys()
to a list. This post discusses this python version issue comprehensively.
Code
def plot_distribution(letter_frequency):
centers = range(len(LETTERS))
plt.xlabel("Letters")
plt.ylabel("Numbers")
plt.bar(centers, letter_frequency.values(), align='center',
tick_label=list(letter_frequency.keys())) # <--- list conversion
plt.xlim([0,len(LETTERS)-1])
Upvotes: 1