Reputation: 1183
I have the following sentence:
sentence = "<s> online auto body <s>"
And I would like first to make words 3-grams out of it as:
('<s>', 'outline', 'auto')
('online', 'auto', 'body')
('auto', 'body', '<s>')
To do so I used the following code:
sentence = '<s> online auto body <s>'
n = 3
word_3grams = ngrams(sentence.split(), n)
for grams in word_3grams:
print(grams)
Now, I would like to get "#" at the beginning and at the end of every word, as follows:
('#<s>#','#outline#','#auto#')
('#online#', '#auto#', '#body#')
('#auto#', '#body#', '#<s>#')
But I don't know what to do in order to get it. As a side note elements here are tuples but it wouldn't mind using lists.
Upvotes: 1
Views: 1000
Reputation: 7844
Here is a solution from the beginning:
sentence = "<s> online auto body <s>"
n = 3
# Split the sentence into words and append the '#' symbol.
words = tuple(map(lambda w: '#'+w+'#', sentence.split()))
# Create a list of elements consisting of three consecutive words.
splits = [words[i:i+n] for i in range(len(words)-(n-1))]
#Print results.
for elem in splits:
print(elem)
Output:
('#<s>#', '#online#', '#auto#')
('#online#', '#auto#', '#body#')
('#auto#', '#body#', '#<s>#')
Upvotes: 0
Reputation: 1217
You can do this using list comprehension and format function:
word_3grams = [('<s>', 'outline', 'auto'),
('online', 'auto', 'body'),
('auto', 'body', '<s>')]
for grams in word_3grams:
print ["{pad}{data}{pad}".format(pad='#', data=s) for s in grams]
['#<s>#', '#outline#', '#auto#']
['#online#', '#auto#', '#body#']
['#auto#', '#body#', '#<s>#']
Upvotes: 0
Reputation: 2821
You want a sliding window like feature.
from itertools import islice
sentence = "<s> online auto body <s>"
myList = sentence.split()
myList = ['#' + word + '#' for word in myList]
slidingWindow = [islice(myList, s, None) for s in range(3)]
print(list(zip(*slidingWindow)))
# [('#<s>#', '#online#', '#auto#'), ('#online#', '#auto#', '#body#'), ('#auto#', '#body#', '#<s>#')]
Upvotes: 1
Reputation: 447
In Python a tuple is immutable, which means it can't be modified. As you somehow suggested, it would be better to use lists, more precisely list comprehension:
aList = ['auto', 'body', '<s>']
newList = ['#' + item + '#' for item in aList]
print (newList)
# ['#auto#', '#body#', '#<s>#']
Upvotes: 0
Reputation: 11368
If you just want to change the strings, try:
map(lambda s: "#" + s + "#", sentence.split())
Upvotes: 0