kultur
kultur

Reputation: 43

Stripping duplicate words from generated text in python script

I made a python script to take text from an input file and randomly rearrange the words for a creative writing project based around the cut-up technique (http://en.wikipedia.org/wiki/Cut-up_technique).

Here's the script as it currently stands. NB: I'm running this as a server side include.

#!/usr/bin/python
from random import shuffle 

src = open("input.txt", "r")
srcText = src.read()
src.close()

srcList = srcText.split()
shuffle(srcList)
cutUpText = " ".join(srcList)
print("Content-type: text/html\n\n" + cutUpText)

This basically does the job I want it to do, but one improvement I'd like to make is to identify duplicate words within the output and remove them. To clarify, I only want to identify duplicates in a sequence, for example "the the" or "I I I". I don't want to make it so that, for example, "the" only appears once in the entire output.

Can someone point me in the right direction to start solving this problem? (My background isn't in programming at all, so I basically put this script together through a lot of reading bits of the python manual and browsing this site. Please be gentle with me.)

Upvotes: 4

Views: 495

Answers (3)

AshishTheDev
AshishTheDev

Reputation: 21

Add this to your existing program:

srcList = list(set(srcText.split()))

Upvotes: 0

Arcturus
Arcturus

Reputation: 548

Adding the lines

spaces = [(i%10) == 9 and '\n' or ' ' for i in range(0,len(srcList))];
cutUpText = "".join(map(lambda x,y: "".join([x,y]),srcList,spaces));

helps bring some raw formatting to the text screens.

Upvotes: 1

Ned Batchelder
Ned Batchelder

Reputation: 375804

You can write a generator to produce words without duplicates:

def nodups(s):
    last = None
    for w in s:
        if w == last:
            continue
        yield w
        last = w

Then you can use this in your program:

cutUpText = " ".join(nodups(srcList))

Upvotes: 5

Related Questions