Reputation: 61
i need to construct a program to my class which will : read a messed text from file and give this text a book form so from input:
This is programing story , for programmers . One day a variable
called
v comes to a bar and ordred some whiskey, when suddenly
a new variable was declared .
a new variable asked : " What did you ordered? "
into output
This is programing story,
for programmers. One day
a variable called v comes
to a bar and ordred some
whiskey, when suddenly a
new variable was
declared. A new variable
asked: "what did you
ordered?"
I am total beginner at programming, and my code is here
def vypis(t):
cely_text = ''
for riadok in t:
cely_text += riadok.strip()
a = 0
for i in range(0,80):
if cely_text[0+a] == " " and cely_text[a+1] == " ":
cely_text = cely_text.replace (" ", " ")
a+=1
d=0
for c in range(0,80):
if cely_text[0+d] == " " and (cely_text[a+1] == "," or cely_text[a+1] == "." or cely_text[a+1] == "!" or cely_text[a+1] == "?"):
cely_text = cely_text.replace (" ", "")
d+=1
def vymen(riadok):
for ch in riadok:
if ch in '.,":':
riadok = riadok[ch-1].replace(" ", "")
x = int(input("Zadaj x"))
t = open("text.txt", "r")
v = open("prazdny.txt", "w")
print(vypis(t))
This code have deleted some spaces and i have tried to delete spaces before signs like " .,_?" but this do not worked why ? Thanks for help :)
Upvotes: 0
Views: 609
Reputation: 8689
You want to do quite a lot of things, so let's take them in order:
Let's get the text in a nice text form (a list of strings):
>>> with open('text.txt', 'r') as f:
... lines = f.readlines()
>>> lines
['This is programing story , for programmers . One day a variable',
'called', 'v comes to a bar and ordred some whiskey, when suddenly ',
' a new variable was declared .',
'a new variable asked : " What did you ordered? "']
You have newlines all around the place. Let's replace them by spaces and join everything into a single big string:
>>> text = ' '.join(line.replace('\n', ' ') for line in lines)
>>> text
'This is programing story , for programmers . One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared . a new variable asked : " What did you ordered? "'
Now we want to remove any multiple spaces. We split by space, tabs, etc... and keep only the non-empty words:
>>> words = [word for word in text.split() if word]
>>> words
['This', 'is', 'programing', 'story', ',', 'for', 'programmers', '.', 'One', 'day', 'a', 'variable', 'called', 'v', 'comes', 'to', 'a', 'bar', 'and', 'ordred', 'some', 'whiskey,', 'when', 'suddenly', 'a', 'new', 'variable', 'was', 'declared', '.', 'a', 'new', 'variable', 'asked', ':', '"', 'What', 'did', 'you', 'ordered?', '"']
Let us join our words by spaces... (only one this time)
>>> text = ' '.join(words)
>>> text
'This is programing story , for programmers . One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared . a new variable asked : " What did you ordered? "'
We now want to remove all the <SPACE>.
, <SPACE>,
etc...:
>>> for char in (',', '.', ':', '"', '?', '!'):
... text = text.replace(' ' + char, char)
>>> text
'This is programing story, for programmers. One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared. a new variable asked:" What did you ordered?"'
OK, the work is not done as the "
are still messed up, the upper case are not set etc... You can still incrementally update your text. For the upper case, consider for instance:
>>> sentences = text.split('.')
>>> sentences
['This is programing story, for programmers', ' One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared', ' a new variable asked:" What did you ordered?"']
See how you can fix it ? The trick is to take only string transformations such that:
This way you can compose them an improve your text incrementally.
Once you have a nicely formatted text, like this:
>>> text
'This is programing story, for programmers. One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared. A new variable asked: "what did you ordered?"'
You have to define similar syntactic rules for printing it out in book format. Consider for instance the function:
>>> def prettyprint(text):
... return '\n'.join(text[i:i+50] for i in range(0, len(text), 50))
It will print each line with an exact length of 50 characters:
>>> print prettyprint(text)
This is programing story, for programmers. One day
a variable called v comes to a bar and ordred som
e whiskey, when suddenly a new variable was declar
ed. A new variable asked: "what did you ordered?"
Not bad, but can be better. Just like we previously juggled with text, lines, sentences and words to match the syntactic rules of English language, with want to do exactly the same to match the syntactic rules of printed books.
In that case, both the English language and printed books work on the same units: words, arranged in sentences. This suggests we might want to work on these directly. A simple way to do that is to define your own objects:
>>> class Sentence(object):
... def __init__(self, content, punctuation):
... self.content = content
... self.endby = punctuation
... def pretty(self):
... nice = []
... content = self.content.pretty()
... # A sentence starts with a capital letter
... nice.append(content[0].upper())
... # The rest has already been prettified by the content
... nice.extend(content[1:])
... # Do not forget the punctuation sign
... nice.append('.')
... return ''.join(nice)
>>> class Paragraph(object):
... def __init__(self, sentences):
... self.sentences = sentences
... def pretty(self):
... # Separating our sentences by a single space
... return ' '.join(sentence.pretty() for sentence in sentences)
etc... This way you can represent your text as:
>>> Paragraph(
... Sentence(
... Propositions([Proposition(['this',
... 'is',
... 'programming',
... 'story']),
... Proposition(['for',
... 'programmers'])],
... ',')
... '.'),
... Sentence(...
etc...
Converting from a string (even a messed up one) to such a tree is relatively straightforward as you only break down to the smallest possible elements. When you want to print it in book format, you can define your own book
methods on each element of the tree, e.g. like this, passing around the current line
, the output lines
and the current offset
on the current line
:
class Proposition(object):
...
def book(self, line, lines, offset, line_length):
for word in self.words:
if offset + len(word) > line_length:
lines.append(' '.join(line))
line = []
offset = 0
line.append(word)
return line, lines, offset
...
class Propositions(object):
...
def book(self, lines, offset, line_length):
lines, offset = self.Proposition1.book(lines, offset, line_length)
if offset + len(self.punctuation) + 1 > line_length:
# Need to add the punctuation sign with the last word
# to a new line
word = line.pop()
lines.append(' '.join(line))
line = [word + self.punctuation + ' ']
offset = len(word + self.punctuation + ' ')
line, lines, offset = self.Proposition2.book(lines, offset, line_length)
return line, lines, offset
And work your way up to Sentence
, Paragraph
, Chapter
...
This is a very simplistic implementation (and actually a non-trivial problem) which does not take into account syllabification or justification (which you would probably like to have), but this is the way to go.
Note that I did not mention the string module, string formatting or regular expressions which are tools to use once you can define your syntactic rules or transformations. These are extremely powerful tools, but the most important here is to know exactly the algorithm to transform an invalid string into a valid one. Once you have some working pseudocode, regexps and format strings can help you achieve it with less pain than plain character iteration. (in my previous example of tree of words for instance, regexps can tremendously ease the construction of the tree, and Python's powerful string formatting functions can make the writing of book
or pretty
methods much easier).
Upvotes: 3
Reputation: 17018
To strip the multiple spaces you could use a simple regex substitution.
import re
cely_text = re.sub(' +',' ', cely_text)
Then for punctuation you could run a similar sub:
cely_text = re.sub(' +([,.:])','\g<1>', cely_text)
Upvotes: 1