Reputation: 965
Please pardon me i am new to Python and Selenium. I am scraping a supermarket website. I get the items name as below which has quantity with the name. I want to substring the quantity from the name for different cases and items as below-
Cases
Fresh Value Colocasia 250g
Fresh Value Banana Robusta 1kg
Fresh Value Raw Papaya 1 U (units) (300g-400g)
Fresh Value Premium Pomegranate Kabul (500g - 700g)
Output Needed:
Name = Fresh Value Colocasia, Quantity = 250g
Name = Fresh Value Banana Robusta, Quantity = 1kg
Name - Fresh Value Raw Papaya, Quantity = 1 U (units) (300g-400g)
It has hundreds of items like this . I have tried using
str.split()
but didn't get the output. I have also tried using regex but not sure how that works. Is there a way in which i can split the string after i find a number in string ? Any suggestions might help.
Upvotes: 1
Views: 364
Reputation: 31
import re
def substring(string):
output = {}
name = string.split()[0]
for i in range(1,len(string.split())):
if len(re.findall('\d', string.split()[i]))==0:
name = name + " " + string.split()[i]
else:
quantity = " ".join(string.split()[i:])
break
output["Name"] = name
output["Quantity"] = quantity
return output
Then put strings into this function like:
substring("Fresh Value Raw Papaya 1 U (units) (300g-400g)")
And you will get:
{'Name': 'Fresh Value Raw Papaya', 'Quantity': '1 U (units) (300g-400g)'}
Upvotes: 1
Reputation: 12417
One option (according to the data samples that you provided) can be this:
import re
strings = ['Fresh Value Colocasia 250g', 'Fresh Value Banana Robusta 1kg', 'Fresh Value Raw Papaya 1 U (units) (300g-400g)','Fresh Value Premium Pomegranate Kabul (500g - 700g)']
for i in strings:
start = re.findall('\d|\(', i)[0]
name = i.split(start)[0].strip()
quantity = start + i.split(start)[1]
print 'Name = '+ name + ', Quantity = ', quantity
Output:
Name = Fresh Value Colocasia, Quantity = 250g
Name = Fresh Value Banana Robusta, Quantity = 1kg
Name = Fresh Value Raw Papaya, Quantity = 1 U (units) (300g-400g)
Name = Fresh Value Premium Pomegranate Kabul, Quantity = (500g - 700g)
Of course it is valid if the numbers and parenthesis are present only in the quantity and not in the name. If the quantity starts with other symbols, you can add them in findall
Upvotes: 1
Reputation: 1178
you can also try this:
def split_unit(stri):
to_split = re.findall("\\d+",stri)[0]
splitted = to_split + stri.split(to_split,1)[1]
print(splitted)
split_unit("Fresh Value Colocasia 250g") #outputs : 250 g
split_unit("Fresh Value Banana Robusta 1kg") #outputs : 1Kg
split_unit("Fresh Value Raw Papaya 1 U (units) (300g-400g)") # outputs:1 U
#(units) (300g-400g)
And so on, What I've done is, first find the first occurrence of a intiger in your string by using regex in the first line inside the function. And use str.split() method to split all the characters after the first integer and conctatinging it with the to_split which is the first integer.
Upvotes: 0