R.swan
R.swan

Reputation: 23

How to format a list containing tags in python

I have a list called tokens and would like to format this list so that when I print it, it is human readable.

The list:

tokens = ['<h1>','Hello','World','</h1>','<p>','Welcome','to','this','planet','</p>']

What I would like the output to look like once formatted:

Heading: Hello World

Paragraph: Welcome to this planet

What I have tried so far:

I have first tried to replace the <h1> and <p> tags so that when output it shows 'Heading: ' and 'Paragraph: ' instead. I used a FOR loop to loop through all the tokens and find the correct tags to replace:

for token in tokens:
# comparing strings
elif token == '<h1>':
   print(token.replace('<h1>', 'Heading: '))
elif token == '<p>':
   print(token.replace('<p>', 'Paragraph: '))

The next part I need to do is print out the sentences between the <h1> tags and the <p> tags. For this I thought of creating a method, the general pseudo code is:

def between(tokens, tag, endTag)
    if token is between tag and endTag
        print the sentence 

I don't really know how to get this method to work in python and have tried something like this:

def between(tokens, tag, endTag):
sentence = []
for token in tokens:
    if(token > tag and token < endTag):
        sentance.append(token)
return sentance

but I know the if statement does not make sense and does not work out overall. How can I solve this problem and format the list correctly?

Upvotes: 2

Views: 99

Answers (2)

whackamadoodle3000
whackamadoodle3000

Reputation: 6748

You could try this:

" ".join('@#'.join([e for e in tokens if '</' not in e]).replace("<h1>","\n Heading:").replace("<p>","\n Paragraph:").split("@#"))

Given that your string doesn't have a @#.

Upvotes: 0

DYZ
DYZ

Reputation: 57033

You can create a dictionary of human-readable tag names and replace a tag with its name. If a token is not a tag, it is not replaced.

tags = {"<h1>" : 'Heading1: ', "</h1>" : "\n", 
        "<p>" : "Paragraph: ", "</p>" : "\n", ... }
new_tokens = [tags.get(token.lower(),token) for token in tokens]
print("".join(new_tokens))
#Heading1: HelloWorld
#Paragraph: Welcometothisplanet

The .lower() function call makes the lookup case-insensitive.

Upvotes: 2

Related Questions