Q: Pandas dataframe from for loop

Question

EDIT 2, 9/1 See my answer below!

Pretty new at Python and Pandas here. I've got a script here that uses a for loop to query my database using each line in my list. That all works great, but I can't figure out how to build a data frame from the results of that loop. Any and all pointers are welcome!

#Remove stuff
print "Cleaning list"
def multiple_replacer(key_values):
  replace_dict = dict(key_values)
  replacement_function = lambda match: replace_dict[match.group(0)]
  pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
  return lambda string: pattern.sub(replacement_function, string)

multi_line = multiple_replacer(key_values)

print "Querying Database..."
for line in source:
  brand_url = multi_line(line)
  #Run Query with cleaned list
  mysql_query = ("select ub.url as 'URL', b.name as 'Name', b.id as 'ID' from api.brand b join api.url_brand ub on b.id=ub.brand_id where ub.url like '%%%s%%' and b.deleted=0 group by 3;" % brand_url)
  list1 = []
  brands = my_query('prod', mysql_query)

print "Writing CSV..."
#Create DF and CSV
for row in brands:
  list1.append({"URL":row['URL'],"Name":['Name'],"ID":['ID']})
  if brands.shape == (3,0):
    df1 = pd.DataFrame(data = brands, columns=['URL','Name','ID'])
  output = df1.to_csv('ongoing.csv',index=False)

EDIT 8/30 Here is my edit, attempting to use zyxue's method:

#Remove stuff
print "Cleaning list"
def multiple_replacer(key_values):
  replace_dict = dict(key_values)
  replacement_function = lambda match: replace_dict[match.group(0)]
  pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
  return lambda string: pattern.sub(replacement_function, string)

multi_line = multiple_replacer(key_values)

print "Querying Database..."
for line in source:
  brand_url = multi_line(line)
  #Run Query with cleaned list
  mysql_query = ("select ub.url as 'URL', b.name as 'Name', b.id as 'ID' from api.brand b join api.url_brand ub on b.id=ub.brand_id where ub.url like '%%%s%%' and b.deleted=0 group by 3;" % brand_url)
  brands = my_query('prod', mysql_query)

print "Writing CSV..."
#Create DF and CSV
records = []
for row in brands:
  records.append({"URL":row['URL'],"Name":['Name'],"ID":['ID']})
  if brands.shape == (3,0):
    records.append(dict(zip(brands, ['URL', 'Name', 'ID'])))
df1 = pd.DataFrame.from_records(records)
output = df1.to_csv('ongoing.csv', index=False)

but this only returns a blank CSV. I'm sure I'm applying it wrong.

sstewart · Accepted Answer

Okay, I figured it out, and I thought I should post the working script. @zyxue was pretty much right.

source = open('urls.txt')
key_values = ("http://",""), ("https://",""), ("www.",""), ("
","")

#Remove stuff
print "Cleaning list"
def multiple_replacer(key_values):
  replace_dict = dict(key_values)
  replacement_function = lambda match: replace_dict[match.group(0)]
  pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
  return lambda string: pattern.sub(replacement_function, string)

multi_line = multiple_replacer(key_values)

print "Querying Database..."
records = []
for line in source:
  brand_url = multi_line(line)
  #Run Query with cleaned list
  mysql_query = ("select ub.url as 'URL', b.name as 'Name', b.id as 'ID' from api.brand b join api.url_brand ub on b.id=ub.brand_id where ub.url like '%%%s%%' and b.deleted=0 group by 3;" % brand_url)
  brands = my_query('prod', mysql_query)
  #Append results to dict (records)
  for row in brands:
      records.append({"URL":row['URL'],"Name":row['Name'],"ID":row['ID']})

#Create DataFrame
df = pd.DataFrame.from_dict(records)

#Create CSV    
output = df.to_csv('ongoing.csv',index=False)

Essentially, I needed to layer the second for loop under the first and create the 'records' dictionary before the looping began. This causes an append to the dictionary for every line in 'source'. Seems like a pretty simple concept now!

Q: Pandas dataframe from for loop

Answers (2)

Related Questions