Reputation: 23
I have a list called tokens and would like to format this list so that when I print it, it is human readable.
The list:
tokens = ['<h1>','Hello','World','</h1>','<p>','Welcome','to','this','planet','</p>']
What I would like the output to look like once formatted:
Heading: Hello World
Paragraph: Welcome to this planet
What I have tried so far:
I have first tried to replace the <h1>
and <p>
tags so that when output it shows 'Heading: ' and 'Paragraph: ' instead. I used a FOR loop to loop through all the tokens and find the correct tags to replace:
for token in tokens:
# comparing strings
elif token == '<h1>':
print(token.replace('<h1>', 'Heading: '))
elif token == '<p>':
print(token.replace('<p>', 'Paragraph: '))
The next part I need to do is print out the sentences between the <h1>
tags and the <p>
tags. For this I thought of creating a method, the general pseudo code is:
def between(tokens, tag, endTag)
if token is between tag and endTag
print the sentence
I don't really know how to get this method to work in python and have tried something like this:
def between(tokens, tag, endTag):
sentence = []
for token in tokens:
if(token > tag and token < endTag):
sentance.append(token)
return sentance
but I know the if statement does not make sense and does not work out overall. How can I solve this problem and format the list correctly?
Upvotes: 2
Views: 99
Reputation: 6748
You could try this:
" ".join('@#'.join([e for e in tokens if '</' not in e]).replace("<h1>","\n Heading:").replace("<p>","\n Paragraph:").split("@#"))
Given that your string doesn't have a @#.
Upvotes: 0
Reputation: 57033
You can create a dictionary of human-readable tag names and replace a tag with its name. If a token is not a tag, it is not replaced.
tags = {"<h1>" : 'Heading1: ', "</h1>" : "\n",
"<p>" : "Paragraph: ", "</p>" : "\n", ... }
new_tokens = [tags.get(token.lower(),token) for token in tokens]
print("".join(new_tokens))
#Heading1: HelloWorld
#Paragraph: Welcometothisplanet
The .lower()
function call makes the lookup case-insensitive.
Upvotes: 2