i_love_pizza
i_love_pizza

Reputation: 1

extracting word from a sentence using split in python

I am having difficulty in understanding as for how this piece of code works.

def get_title(name):
    if '.' in name:
        return name.split(',')[1].split('.')[0].strip()
    else:
        return 'Unknown'

ans=get_title('Braund, Mr. Owen Harris')
print (ans)

As I know the split is used for splitting characters but this does not make much sense to me.

Upvotes: 0

Views: 1144

Answers (6)

Merlin
Merlin

Reputation: 25639

Does this help?

    def get_title(name):
        print type(name), name 
        if '.' in name:
            print  type(name.split(',')), name.split(',') 
            print  type(name.split(',')[1]) , name.split(',')[1]  
            print  type(name.split(',')[1].split('.')  ), name.split(',')[1].split('.')        
            print  type(name.split(',')[1].split('.')[0])  , name.split(',')[1].split('.')[0]                  
            print  type(name.split(',')[1].split('.')[0].strip() ), name.split(',')[1].split('.')[0].strip()    
            return name.split(',')[1].split('.')[0].strip()
        else:
            return 'Unknown'

    ans=get_title('Braund, Mr. Owen Harris')
    print (ans)

Returns:

<type 'str'> Braund, Mr. Owen Harris
<type 'list'> ['Braund', ' Mr. Owen Harris']
<type 'str'>  Mr. Owen Harris
<type 'list'> [' Mr', ' Owen Harris']
<type 'str'>  Mr
<type 'str'> Mr
Mr

Upvotes: 0

Szymon Stepniak
Szymon Stepniak

Reputation: 42184

It's easy to understand what happens if you play a little with with Python REPL. The most interesting part happens in line 3 of the code you've shown:

return name.split(',')[1].split('.')[0].strip()

Let's run it step by step in REPL to understand what happens:

>>> 'Braund, Mr. Owen Harris'.split(',')
['Braund', ' Mr. Owen Harris']
>>> 'Braund, Mr. Owen Harris'.split(',')[1]
' Mr. Owen Harris'
>>> 'Braund, Mr. Owen Harris'.split(',')[1].split('.')
[' Mr', ' Owen Harris']
>>> 'Braund, Mr. Owen Harris'.split(',')[1].split('.')[0]
' Mr'
>>> 'Braund, Mr. Owen Harris'.split(',')[1].split('.')[0].strip()
'Mr'

As you can see this function is meant to extract titles like Mr, Ms etc. This implementation is error prone if specific characters like , or . are not found in the input string, for example:

>>> 'Braund Mr. Owen Harris'.split(',')[1].split('.')[0].strip()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

Upvotes: 4

ettanany
ettanany

Reputation: 19806

I will explain based on your example:

name = 'Braund, Mr. Owen Harris'

if '.' in name:

Does name contain .? Yes, then you split name using ,, which returns:

['Braund', ' Mr. Owen Harris']

Now, you retrieve the second element (name.split(',')[1]), and you split it using ., which returns:

[' Mr', ' Owen Harris']

Then, you retrieve the first element (name.split(',')[1].split('.')[0]), which returns:

' Mr'

strip()is used to remove extra leading and trailing spaces.

Then the final result:

'Mr'

Upvotes: 0

agtoever
agtoever

Reputation: 1699

What happens here is called Method chaining. This means that if a method returns an object, you can append a method on the returned object directly on the calling method.

Let's break down that return line of code:

  • name.split(',') returns a list of items, where each ',' in the string is treated as a separator between list items. On 'Braund, Mr. Owen Harris'), this returns the following list: ['Braund', ' Mr. Owen Harris']
  • The following [1] selects the second item in the list, which is a string object ('Mr. Owen Harris')
  • Next, split('.') splits that string again into a list, returning ['Mr', 'Owen Harris'].
  • Then, the first item is selected by [0], returning the string 'Mr'.
  • Finally, strip() removes all whitespaces from the beginning and the end of the string.

Upvotes: 0

Yuval Ben-Arie
Yuval Ben-Arie

Reputation: 1290

You start with:

'Braund, Mr. Owen Harris'

The first split will find all ',' and split the string at those positions. So you get:

['Braund', ' Mr. Owen Harris']

Then you take the second element so you are left with:

' Mr. Owen Harris'

You then split this string by '.' and get:

[' Mr', ' Owen Harris']

After that you take the first element:

' Mr'

And strip it:

'Mr'

Upvotes: 0

Mohd
Mohd

Reputation: 5613

You should do the splits one by one and see how its going, for example:

name = 'Braund, Mr. Owen Harris'
name = name.split(',')[1] # this split will give ['Braund', ' Mr. Owen Harris']
                          # then it takes element 1 which is ' Mr. Owen Harris'
name = name.split('.')[0] # here the split is [' Mr', ' Owen Harris']
                          # then it takes elemet 0 which is ' Mr'
name = name.strip()       # strip removes white spaces from the string (the leading space for this case)

Upvotes: 1

Related Questions