Reputation: 5307
I am learning more and more about python, moving from less efficient looping over lists of lists and have been learning numpy/pandas.
Let's say I have a pandas data frame df, for example something that looks like:
| id | Year | Value |
---------------------
| 1 | 2015 | 1.00 |
| 1 | 2014 | 0.80 |
| 2 | 2015 | 1.10 |
| 3 | 2015 | 0.6 |
etc
Now how would I effectively in python, query my DB based on the values in a specific column and then append the queried values to new columns. Right now my code would look something like this
uIDs = set(df['id'])
uIDs = ",".join([str(x) for x in uIDs])
with dbconnection as con:
data = con.execute("SELECT id, value2 FROM table WHERE ids IN ("+uIDs")")
for i in data.fetchall():
df[df['id'] == i[0]]['newcol'] = i[1]
Sorry for any errors in the code, I was typing off my head to give an example.
As you see I am trying to pull all unique IDs from the pandas data frame, select the new values from the database and then iteratively loop through them to append to a new column in the data frame.
Is there a more effective/efficient/pythonic way to do this? If you are looking at more than just 3 but 3,000 or 30,000 rows is this the best way to approach it?
Upvotes: 2
Views: 155
Reputation: 37938
Since you want an SQL-style left join, just use
pd.merge(df, data.fetchall(), on='id', how='left')
Upvotes: 2