Marisa
Marisa

Reputation: 1183

How to add character to a string element in a tuple or list?

I have the following sentence:

sentence = "<s> online auto body <s>" 

And I would like first to make words 3-grams out of it as:

('<s>', 'outline', 'auto')
('online', 'auto', 'body')
('auto', 'body', '<s>')

To do so I used the following code:

sentence = '<s> online auto body <s>'
n = 3
word_3grams = ngrams(sentence.split(), n)
for grams in word_3grams: 
    print(grams)

Now, I would like to get "#" at the beginning and at the end of every word, as follows:

('#<s>#','#outline#','#auto#')
('#online#', '#auto#', '#body#')
('#auto#', '#body#', '#<s>#')

But I don't know what to do in order to get it. As a side note elements here are tuples but it wouldn't mind using lists.

Upvotes: 1

Views: 1000

Answers (5)

Vasilis G.
Vasilis G.

Reputation: 7844

Here is a solution from the beginning:

sentence = "<s> online auto body <s>" 
n = 3

# Split the sentence into words and append the '#' symbol.
words = tuple(map(lambda w: '#'+w+'#', sentence.split()))

# Create a list of elements consisting of three consecutive words.
splits = [words[i:i+n] for i in range(len(words)-(n-1))]

#Print results.
for elem in splits:
    print(elem)

Output:

('#<s>#', '#online#', '#auto#')
('#online#', '#auto#', '#body#')
('#auto#', '#body#', '#<s>#')

Upvotes: 0

akshat
akshat

Reputation: 1217

You can do this using list comprehension and format function:

word_3grams = [('<s>', 'outline', 'auto'),
               ('online', 'auto', 'body'),
               ('auto', 'body', '<s>')]

for grams in word_3grams: 
    print ["{pad}{data}{pad}".format(pad='#', data=s) for s in grams]

['#<s>#', '#outline#', '#auto#']
['#online#', '#auto#', '#body#']
['#auto#', '#body#', '#<s>#']

Upvotes: 0

BcK
BcK

Reputation: 2821

You want a sliding window like feature.

from itertools import islice

sentence = "<s> online auto body <s>"
myList = sentence.split()
myList = ['#' + word + '#' for word in myList]

slidingWindow = [islice(myList, s, None) for s in range(3)]
print(list(zip(*slidingWindow)))

# [('#<s>#', '#online#', '#auto#'), ('#online#', '#auto#', '#body#'), ('#auto#', '#body#', '#<s>#')]

Upvotes: 1

IvanJijon
IvanJijon

Reputation: 447

In Python a tuple is immutable, which means it can't be modified. As you somehow suggested, it would be better to use lists, more precisely list comprehension:

aList = ['auto', 'body', '<s>']
newList = ['#' + item + '#' for item in aList]
print (newList)
# ['#auto#', '#body#', '#<s>#']

Upvotes: 0

csl
csl

Reputation: 11368

If you just want to change the strings, try:

map(lambda s: "#" + s + "#", sentence.split())

Upvotes: 0

Related Questions