Reputation: 43
I made a python script to take text from an input file and randomly rearrange the words for a creative writing project based around the cut-up technique (http://en.wikipedia.org/wiki/Cut-up_technique).
Here's the script as it currently stands. NB: I'm running this as a server side include.
#!/usr/bin/python
from random import shuffle
src = open("input.txt", "r")
srcText = src.read()
src.close()
srcList = srcText.split()
shuffle(srcList)
cutUpText = " ".join(srcList)
print("Content-type: text/html\n\n" + cutUpText)
This basically does the job I want it to do, but one improvement I'd like to make is to identify duplicate words within the output and remove them. To clarify, I only want to identify duplicates in a sequence, for example "the the" or "I I I". I don't want to make it so that, for example, "the" only appears once in the entire output.
Can someone point me in the right direction to start solving this problem? (My background isn't in programming at all, so I basically put this script together through a lot of reading bits of the python manual and browsing this site. Please be gentle with me.)
Upvotes: 4
Views: 495
Reputation: 21
Add this to your existing program:
srcList = list(set(srcText.split()))
Upvotes: 0
Reputation: 548
Adding the lines
spaces = [(i%10) == 9 and '\n' or ' ' for i in range(0,len(srcList))];
cutUpText = "".join(map(lambda x,y: "".join([x,y]),srcList,spaces));
helps bring some raw formatting to the text screens.
Upvotes: 1
Reputation: 375804
You can write a generator to produce words without duplicates:
def nodups(s):
last = None
for w in s:
if w == last:
continue
yield w
last = w
Then you can use this in your program:
cutUpText = " ".join(nodups(srcList))
Upvotes: 5