captaindogface
captaindogface

Reputation: 53

Find the first word of a paragraph inside a list

I have created a list that contains a different paragraph inside each element.

I want to find the first word of each paragraph.

The only thing I can come up with is to split each paragraph in to individual words and find element[0]. This seems fairly excessive as I already have each paragraph already in the list

So what is a better way to do this?

Upvotes: 2

Views: 3202

Answers (4)

Hugh Bothwell
Hugh Bothwell

Reputation: 56654

Good grief:

my_paras = ["It was the best of times", "Twas a dark and stormy night", "The walrus and the carpenter"]

my_first_words = [para.split(None, 1)[0] for para in my_paras]

returns

['It', 'Twas', 'The']

The None parameter to split means 'split on any contiguous whitespace' and is usually implicit, however I have to specify it here in order to also supply the second position parameter, maxsplit. By passing maxsplit=1, .split() stops after it finds the first whitespace character (returning a two-item list consisting of the first word and the remainder of the paragraph) or once it hits the end of the string (returning a one-item list, the whole run-on paragraph).

Upvotes: 1

bukzor
bukzor

Reputation: 38482

How do you want your words layed out? Do you wan't to guarantee they're just not whitespace, or that they don't contain punctuation?

First cut:

first_words = [
        paragraph.split(None, 1)[0]
        for paragraph in paragraphs
]

Upvotes: 0

Gerrat
Gerrat

Reputation: 29700

Something like this?

l = ['start of paragraph 1','start of paragraph 2','para 3']
first_words = [p.split()[0] for p in l]
print first_words

prints: ['start', 'start', 'para']

If you don't want to split each paragraph, you could search for the index of the first space, and grab each word up to that:

l = ['start of paragraph 1','start of paragraph 2','para 3']
first_words = [p[:p.find(' ')] for p in l]
print first_words

prints: ['start', 'start', 'para']

Explanation as requested:

  • find the first space in the paragraph with p.find(' ') - returns the position
  • then take the first characters in the paragraph via p[:p.find(' ')]
  • the remainder of that line is called a list comprehension and basically loops through your list and takes each paragraph, p in turn

Upvotes: 3

Assuming that each paragraph starts with a word (and not say, a space or a number):

[par[:par.index(" ")] for par in list_of_par]

This is what is called a "list comprehension". It goes through each item in list_of_par and applies par[:par.index(" ")] to it. This takes a slice of the paragraph (par), in this case, from the 0th character up to (but not including) the first space ([:par.index(" ")]).

The list comprehension returns a list of strings; each string being all the characters in the paragraph until the first space.

Upvotes: 0

Related Questions