Reputation: 105
My idea is the following:
Documents
) which contains some texts, e.g. blog posts. Each document has a unique id text_id
.Links
) which stores unique urls that appear in these posts. Each url has a unique id url_id
.Association
), which maps an id of a text to an id of a domain.I want to be able to get the posts, collect urls from those, and then:
Documents
Links
and relate to the document through Association
For starters, I created three classes like here:
class Association(Base):
__tablename__ = 'association'
text_id = Column('text_id', Integer, ForeignKey('left.text_id'), primary_key=True)
url_id = Column('url_id', Integer, ForeignKey('right.url_id'), primary_key = True)
child = relationship("Links", back_populates='parents')
parent = relationship("Documents", back_populates='children')
class Documents(Base):
__tablename__ = 'left'
text_id = Column(Integer, primary_key=True, unique=True)
text = Column(Text)
children = relationship("Association", back_populates='parent')
class Links(Base):
__tablename__ = 'right'
url_id = Column(Integer, primary_key=True, autoincrement=True, unique=True)
url = Column(Text, unique=True)
parents = relationship('Association', back_populates = 'child')
Base.metadata.create_all(engine)
Then I am trying to load the data:
data = [
{'id':1, 'text':'sometext', 'url':'facebook.com'},
{'id':2, 'text':'sometext', 'url':'twitter.com'},
{'id':3, 'text':'sometext', 'url':'twitter.com'}
]
for row in data:
d = Document(text_id = row['id'])
a = Association()
a.child = Links(url = row['url'])
d.children.append(a)
session.add(d)
session.commit()
Which results in an error:
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.7.12/envs/myenv/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3444, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-13-325b1cd57576>", line 5, in <module>
p.children.append(a)
File "/home/user/.pyenv/versions/3.7.12/envs/myenv/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 1240, in __getattr__
return self._fallback_getattr(key)
File "/home/user/.pyenv/versions/3.7.12/envs/myenv/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 1214, in _fallback_getattr
raise AttributeError(key)
AttributeError: append
I really don't get why, as it seems I have done everything like an official documentation suggests.
For another thing, even if this works, I suspect that appending an already existing url through p.children.append(a)
might result in an error, as it will essentially try to create a duplicate, and Links
does not allow that.
I'm using mySQL and MariaDB if that matters.
Perhaps I have chosen a wrong tool for the job - I would be grateful if you could suggest what may be an alternative.
UPD: I could not insert because I instantiated a base with automap_base()
instead of declarative_base()
. Now I can append, however, the duplicate entries are indeed a problem:
sqlalchemy.exc.IntegrityError: (pymysql.err.IntegrityError) (1062, "Duplicate entry 'twitter.com' for key 'url'")
[SQL: INSERT INTO `right` (url) VALUES (%(url)s)]
[parameters: {'url': 'twitter.com'}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
Upvotes: 1
Views: 350
Reputation: 1346
First, it will be easier to debug if you use proper domain names instead of: right
, left
, child
, children
. I know that is a copy from docs, but docs are generic while your case is specific. Your code will be much more readable.
To avoid duplicates you should check before insert that record already exist (Documents
have unique text_id
, Links
have unique url
).
for row in data:
d = session.query(Document).filter_by(text_id=row['id']).first()
if not d:
d = Document(text_id=row['id'])
link = session.query(Links).filter_by(url=row['url']).first():
if not link:
link = Links(url=row['url'])
a = Association(child=link)
d.children.append(a)
session.add(d)
session.flush()
session.commit()
Upvotes: 1