user1829708
user1829708

Reputation: 73

Split string, unicode, unicode, string in python

I was trying to split combination of string, unicode in python. The split has to be made on the ResultSet object retrieved from web-site. Using the code below, I am able to get the details, actually it is user details:

from bs4 import BeautifulSoup
import urllib2
import re

url = "http://www.mouthshut.com/vinay_beriwal"
profile_user = urllib2.urlopen(url)
profile_soup = BeautifulSoup(profile_user.read())

usr_dtls = profile_soup.find("div",id=re.compile("_divAboutMe")).find_all('p')
for dt in usr_dtls:
    usr_dtls = " ".join(dt.text.split())
    print(usr_dtls)

The output is as below:

i love yellow..

Name: Vinay Beriwal
Age: 39 years
Hometown: New Delhi, India
Country: India
Member since: Feb 11, 2016

What I need is to create distinct 5 variables as Name, Age, Hometown, Country, Member since and store the corresponding value after ':' for same.

Thanks

Upvotes: 0

Views: 310

Answers (2)

Gurupad Mamadapur
Gurupad Mamadapur

Reputation: 989

You can use a dictionary to store name-value pairs.For example -

my_dict = {"Name":"Vinay","Age":21}

In my_dict, Name and Age are the keys of the dictionary, you can access values like this -

print (my_dict["Name"])   #This will print Vinay

Also, it's nice and better to use complete words for variable names.

results = profile_soup.find("div",id=re.compile("_divAboutMe")).find_all('p')

user_data={}  #dictionary initialization
for result in results:
    result = " ".join(result.text.split())
    try:
        var,value = result.strip().split(':')
        user_data[var.strip()]=value.strip()
    except:
        pass


#If you print the user_data now
print (user_data)

'''
This is what it'll print
{'Age': ' 39 years', 'Country': ' India', 'Hometown': 'New Delhi, India', 'Name': 'Vinay Beriwal', 'Member since': 'Feb 11, 2016'}
'''

Upvotes: 2

ettanany
ettanany

Reputation: 19806

You can use a dictionary to store your data:

my_dict = {}

for dt in usr_dtls:
    item = " ".join(dt.text.split())
    try:
        if ':' in item:
            k, v = item.split(':')
            my_dict[k.strip()] = v.strip()
    except:
        pass

Note: You should not use usr_dtls inside your for loop, because that's would override your original usr_dtls

Upvotes: 0

Related Questions