sojkin
sojkin

Reputation: 33

How to exclude newline mark from requests.get().text

I'm trying to get rid of numbers from site response http://app.lotto.pl/wyniki/?type=dl with code below

import requests
import re

url = 'http://app.lotto.pl/wyniki/?type=dl'
p = re.compile(r'[^\d{4}\-\d{2}\-\d{2}]\d+')

response = requests.get(url)
data = re.findall(p, response.text)
print(data)

but instead of ['7', '46', '8', '43', '9', '47'] I'm getting ['\n7', '\n46', '\n8', '\n43', '\n9', '\n47'] How can I get rid of "\n"?

Upvotes: 2

Views: 5203

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627103

Your regex is not appropriate because [^\d{4}\-\d{2}\-\d{2}]\d+ matches any character but a digit, {, 4, }, -, 2 and then 1 or more digits. In other words, you turned a sequence into a character set. And that negated character class can match a newline. It can match any letter, too. And a lot more. strip will not help in other contexts, you need to fix the regular expression.

Use

r'(?<!-)\b\d+\b(?!-)'

See the regex and IDEONE demo

This pattern will match 1+ digits (\d+) that are not preceded with a hyphen ((?<!-)) or any word characters (\b) and is not followed with a word character (\b) or a hyphen (-).

You code will look like:

import requests
import re

url = 'http://app.lotto.pl/wyniki/?type=dl'
p = re.compile(r'(?<!-)\b\d+\b(?!-)')

response = requests.get(url)
data = p.findall(response.text)
print(data)

Upvotes: 3

rock321987
rock321987

Reputation: 11042

You can strip \n using strip() function

data = [x.strip() for x in re.findall(p, response.text)]

I am assuming that \n can be in beginning as well as in end

Upvotes: 2

AlessioX
AlessioX

Reputation: 3177

Since your numbers are strings, you can easily use lstrip() method for strings. Such method will indeed remove newline/carriage return characters at the left side of your string (that's why lstrip).
You can try something like

print([item.lstrip() for item in data])

to remove your newlines.

Or you can as well overwrite data with the stripped version of itself:

data=[item.lstrip() for item in data]

and then simply print(data).

Upvotes: 0

Related Questions