T.Python
T.Python

Reputation: 19

Find first word in string Python

I have to write a single function that should return the first word in the following strings:

("Hello world") -> return "Hello"
(" a word ") -> return "a"
("don't touch it") -> return "don't"
("greetings, friends") -> return "greetings"
("... and so on ...") -> return "and"
("hi") -> return "hi"

All have to return the first word and as you can see some start with a whitespace, have apostrophes or end with commas.

I've used the following options:

return text.split()[0]
return re.split(r'\w*, text)[0]

Both error at some of the strings, so who can help me???

Upvotes: 0

Views: 8289

Answers (6)

AlexArnold
AlexArnold

Reputation: 1

I've done this by using the first occurrence of white space to stop the "getting" of the first word. Something like this:

stringVariable = whatever sentence
firstWord = ""
stringVariableLength = len(stringVariable)
for i in range(0, stringVariableLength):
    if stringVariable[i] != " ":
        firstWord = firstWord + stringVariable[i]
    else:
        break

This code will parse through the string variable that you want to get the first word of, and add it into a new variable called firstWord, until it gets to the first occurance of white space. I'm not exactly sure how you would put that into a function as I'm pretty new to this whole thing, but I'm sure it could be done!

Upvotes: 0

user9158931
user9158931

Reputation:

You can try something like this:

import re
pattern=r"[a-zA-Z']+"
def first_word(words_tuple):
    match=re.findall(pattern,words_tuple)
    for i in match:
        if i[0].isalnum():
            return i



print(first_word(("don't touch it")))

output:

don't

Upvotes: 0

Abhijit
Abhijit

Reputation: 1856

Try the below code. I tested with all your inputs and it works fine.

import re
text=["Hello world"," a word ","don't touch it","greetings, friends","... and so on ...","hi"]
for i in text:
    rgx = re.compile("(\w[\w']*\w|\w)")
    out=rgx.findall(i)
    print out[0]

Output:

Hello
a
don't
greetings
and
hi

Upvotes: 2

Chris_Rands
Chris_Rands

Reputation: 41168

A non-regex solution: stripping off leading punctation/whitespace characters, splitting the string to get the first word, then removing trailing punctuation/whitespace:

from string import punctuation, whitespace

def first_word(s):
    to_strip = punctuation + whitespace
    return s.lstrip(to_strip).split(' ', 1)[0].rstrip(to_strip)

tests = [
"Hello world",
"a word",
"don't touch it",
"greetings, friends",
"... and so on ...",
"hi"]

for test in tests:
    print('#{}#'.format(first_word(test)))

Outputs:

#Hello#
#a#
#don't#
#greetings#
#and#
#hi#

Upvotes: 1

Shen Yudong
Shen Yudong

Reputation: 1230

try this one:

>>> def pm(s):
...     p = r"[a-zA-Z][\w']*"
...     m = re.search(p,s)
...     print m.group(0)
... 

test result:

>>> pm("don't touch it")
don't
>>> pm("Hello w")
Hello
>>> pm("greatings, friends")
greatings
>>> pm("... and so on...")
and
>>> pm("hi")
hi

Upvotes: 1

Alfe
Alfe

Reputation: 59426

It is tricky to distinguish apostrophes which are supposed to be part of a word and single quotes which are punctuation for the syntax. But since your input examples do not show single quotes, I can go with this:

re.match(r'\W*(\w[^,. !?"]*)', text).groups()[0]

For all your examples, this works. It won't work for atypical stuff like "'tis all in vain!", though. It assumes that words end on commas, dots, spaces, bangs, question marks, and double quotes. This list can be extended on demand (in the brackets).

Upvotes: 1

Related Questions