Reputation: 414
I am having some trouble trying to split each element within a nested list. I used this method for my first split. I want to do another split to the now nested list. I thought I could simply use the same line of code with a few modifications goal2 = [[j.split("") for j in goal]]
, but I continue to get a common error: 'list' object has no attribute 'split'
. I know that you cannot split a list, but I do not understand why my modification is any different than the linked method. This is my first project with web scraping and I am looking for just the phone numbers of the website. I'd like some help to fix my issue and not a new code so that I can continue to learn and improve my own methods.
import requests
import re
from bs4 import BeautifulSoup
source = requests.get('https://www.pickyourownchristmastree.org/ORxmasnw.php').text
soup = BeautifulSoup(source, 'lxml')
info = soup.findAll(text=re.compile("((?:\d{3}|\(\d{3}\))?(?:\s|-|\.)?\d{3}(?:\s|-|\.)\d{4})"))[:1]
goal = [i.split(".") for i in info]
goal2 = [[j.split("") for j in goal]]
for x in goal:
del x[2:]
for y in goal:
del y[:1]
print('info:', info)
print('goal:', goal)
Output without goal2
variable:
info: ['89426 Green Mountain Road, Astoria, OR 97103. Phone: 503-325-9720. Open: ']
goal: [[' Phone: 503-325-9720']]
Desired Output with "goal2
" variable:
info: [info: ['89426 Green Mountain Road, Astoria, OR 97103. Phone: 503-325-9720. Open: ']
goal: [[' Phone: 503-325-9720']]
goal2: ['503-325-9720']
I will obviously have more more numbers, but I didn't want to clog up the space. So it would look somthing more like this:
goal2: ['503-325-9720', '###-###-####', '###-###-####', '###-###-####']
But I want to make sure that each number can be exported into a new row within a csv file. So when I create a csv file with a header "Phone" each number above will be in a seperate row and not clustered together. I am thinking that I might need to change my code to a for loop???
Upvotes: 1
Views: 93
Reputation: 13888
The cleaner approach here would be to just do another regex search on your info
, e.g.:
pat = re.compile(r'\d{3}\-\d{3}\-\d{4}')
goal = [pat.search(i).group() for i in info if pat.search(i)]
Outputs:
goal: ['503-325-9720']
Or if there are more than one number per line:
# use captive group instead
pat = re.compile(r'(\d{3}\-\d{3}\-\d{4})')
goal = [pat.findall(i) for i in info]
Outputs:
goal = [['503-325-9720', '123-456-7890']]
Upvotes: 1