Reputation: 143
Using Python 3 and MongoDB 2.6 and trying to insert some data into my collection, here is the sample code:
from urllib.parse import urlparse
from bs4 import BeautifulSoup
import requests
from pymongo import MongoClient
urlList = ['http://....'] #bunch of URLs
jsArray = []
cssArray = []
client = MongoClient('127.0.0.1', 28017)
db = client.tagFinderProject # Getting the DB
collection = db.tegFinder # Getting the Collection
for url in urlList:
parsed_uri = urlparse(url)
domain = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
for lines in soup.find_all('script'):
if lines.get('src') is not None and '.js' in lines.get('src') and 'http' in lines.get('src'):
jsArray.append(lines.get('src'))
elif str(lines.get('src')).startswith('//'):
jsArray.append('http:' + lines.get('src'))
elif lines.get('src') is not None and '.js' in lines.get('src') and 'http' not in lines.get('src'):
jsArray.append(domain + lines.get('src'))
for lines in soup.find_all('link'):
if lines.get('href') is not None and (lines.get('href')).endswith('.css') and 'http' in lines.get('href'):
cssArray.append(lines.get('href'))
elif lines.get('href') is not None and (lines.get('href')).endswith('.css') and 'http' not in lines.get('href'):
cssArray.append(domain + lines.get('href'))
uniqueJS = list(set(jsArray))
uniqueCSS = list(set(cssArray))
for js in uniqueJS:
collection.insert('JS: ', js)
for css in uniqueCSS:
collection.insert('CSS: ', css)
Ofcourse before I run this I start my MongoDB server and here what it says:
2015-05-13T11:25:03.942-0500 [initandlisten] options: { net: { http: { RESTInterfaceEnabled: true, enabled: true } }, storage: { dbPath: "D:\Projects\mongoDB" } }
2015-05-13T11:25:03.944-0500 [initandlisten] journal dir=D:\Projects\mongoDB\journal
2015-05-13T11:25:03.944-0500 [initandlisten] recover : no journal files present, no recovery needed
2015-05-13T11:25:04.045-0500 [initandlisten] waiting for connections on port 27017
2015-05-13T11:25:04.045-0500 [websvr] admin web console waiting for connections on port 28017
I run the above Python code and I get :
File ".../TagFinder/tagFinder.py", line 91, in <module>
collection.insert('JS: ', js)
File "C:\Python34\lib\site-packages\pymongo\collection.py", line 1924, in insert
with self._socket_for_writes() as sock_info:
File "C:\Python34\lib\contextlib.py", line 59, in __enter__
return next(self.gen)
File "C:\Python34\lib\site-packages\pymongo\mongo_client.py", line 663, in _get_socket
server = self._get_topology().select_server(selector)
File "C:\Python34\lib\site-packages\pymongo\topology.py", line 121, in select_server
address))
File "C:\Python34\lib\site-packages\pymongo\topology.py", line 97, in select_servers
self._error_message(selector))
pymongo.errors.ServerSelectionTimeoutError: connection closed
Can't find why I'm getting this. I can insert data via using the cmd promt and I can display it at 127.0.0.1/tagFinderProject/tagFinder/
Can anyone point me to the right direction?
EDIT 1:
If I change the client = MongoClient('127.0.0.1', 28017)
to client = MongoClient('mongodb://127.0.0.1:27017/')
I get:
TypeError: 'str' object does not support item assignment
Referring to: collection.insert('JS: ', js)
Upvotes: 1
Views: 8894
Reputation: 143
Found the problem;
Feeling dumb thanks to my typo.
1) Collection name: tegFinder
but I'm trying to get 127.0.0.1/tagFinderProject/tagFinder/
2) MongoDB can't insert Strings
but only dict
meaning it needs a key:value pair. So I've changed it to:
dictJS = {'JS: ': js}
collection.insert(dictJS)
3) Not a part of the solution but I've leave the connection empty:
client = MongoClient() # Instead 'cliecnt = MongoClient(mongodb://127.0.0.1:27017, 28017')
db = client.tagFinderProject # Getting the DB
collection = db.tegFinder
Upvotes: 1