Nurdin
Nurdin

Reputation: 23883

How to extract string that contains specific characters in Python

I'm trying to extract ONLY one string that contains $ character. The input based on output that I extracted using BeautifulSoup.

Code

price = [m.split() for m in re.findall(r"\w+/$(?:\s+\w+/$)*", soup_content.find('blockquote', { "class": "postcontent restore" }).text)]

Input

For Sale is my Tag Heuer Carrera Calibre 6 with box and papers and extras.
39mm
47 ish lug to lug
19mm in between lugs
Pretty thin but not sure exact height. Likely around 12mm (maybe less)
I've owned it for about 2 years. I absolutely love the case on this watch. It fits my wrist and sits better than any other watch I've ever owned. I'm selling because I need cash and other pieces have more sentimental value
I am the second owner, but the first barely wore it.
It comes with barely worn blue leather strap, extra suede strap that matches just about perfectly and I'll include a blue Barton Band Elite Silicone.
I also purchased an OEM bracelet that I personally think takes the watch to a new level. This model never came with a bracelet and it was several hundred $ to purchase after the fact.
The watch was worn in rotation and never dropped or knocked around.
The watch does have hairlines, but they nearly all superficial. A bit of time with a cape cod cloth would take care of a lot it them. The pics show the imperfections in at "worst" possible angle to show the nature of scratches.
The bracelet has a few desk diving marks, but all in all, the watch and bracelet are in very good shape.
Asking $2000 obo. PayPal shipped. CONUS.
It's a big hard to compare with others for sale as this one includes the bracelet.

The output should be like this.

2000

Upvotes: 1

Views: 2641

Answers (3)

Yoel Nisanov
Yoel Nisanov

Reputation: 1054

I would do something like that (provided input is the string you wrote above)-

price_start = input.find('$')
price = input[price_start:].split(' ')[0]

IF there is only 1 occurrence like you said.

Alternative- you could use regex like that-

price = re.findall('\S*\$\S*\d', input)[0]
price = price.replace('$', '')

Upvotes: 0

Austin
Austin

Reputation: 26039

You don't need a regex. Instead you can iterate over lines and over each word to check for starting with '$' and extract the word:

[word[1:] for line in s.split('\n') for word in line.split() if word.startswith('$') and len(word) > 1]

where s is your paragraph.

which outputs:

['2000']

Upvotes: 1

marcos
marcos

Reputation: 4510

Since this is very simple you don't need a regex solution, this should sufice:

words = text.split()
words_with_dollar = [word for word in words if '$' in word]
print(words_with_dollar)

>>> ['$', '$2000']

If you don't want the dollar sign alone, you can add a filter like this:

words_with_dollar = [word for word in words if '$' in word and '$' != word]
print(words_with_dollar)

>>> ['$2000']

Upvotes: 0

Related Questions