Reputation: 159
Im trying to place a list that I created from reading in a textfile into a pandas dataframe but its not working for some reason. Below you will find some test data and my functions. The first piece of code does some checking and splitting and the second part appends it to a list called data. Here is some test data
product/productId: B001E4KFG0
review/userId: A3SGXH7AUHU8GW
review/profileName: delmartian
review/helpfulness: 1/1
review/score: 5.0
review/time: 1303862400
review/summary: Good Quality Dog Food
review/text: I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than most.
product/productId: B00813GRG4
review/userId: A1D87F6ZCVE5NK
review/profileName: dll pa
review/helpfulness: 0/0
review/score: 1.0
review/time: 1346976000
review/summary: Not as Advertised
review/text: Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as "Jumbo".
Here is my code:
import pandas as pd
import numpy as np
def grab_next_entry(food_file):
record={'id':-1,'helpfulness':'','number rated':'','score':'','review':''}
line=food_file.readline()
#food_dataframe=pd.DataFrame(columns=column_names)
while line:
if 'product/productId' in line:
split_product_id=line.split(':')
record['id']=split_product_id[1]
if 'review/helpfulness' in line:
split_helpfulness=line.split(':')
split_helpfulness=split_helpfulness[1].split('/')
record['helpfulness']=eval(split_helpfulness[0])
record['number rated']=eval(split_helpfulness[-1])
if 'review/score' in line:
split_score = line.split(':')
record['score']=eval(split_score[1])
if 'review/text' in line:
split_review_text=line.split('review/text:')
record['review']=split_review_text[1:]
if line=='\n':
return record
line=food_file.readline()
The next piece of code is creating the list and trying to put it into a pandas dataframe.
import os
fileLoc = "/Users/brawdyll/Documents/ds710fall2017assignment11/finefoods_test.txt"
column_names=('Product ID', 'People who voted Helpful','Total votes','Rating','Review')
food_dataframe=[]
data=[]
with open(fileLoc,encoding = "ISO 8859-1") as food_file:
fs=os.fstat(food_file.fileno()).st_size
num_read = 0
while not food_file.tell()==fs:
data.append(grab_next_entry(food_file))
num_read+=1
Food_dataframe = pd.DataFrame(data,column_names)
print(Food_dataframe)
Upvotes: 0
Views: 78
Reputation: 2981
There's a lot of improvements that could be made in this code, but the reason why your program isn't working is because you're setting the indices to be column_names
. Running:
pd.DataFrame(data)
will work just fine, and then setting:
df.columns = column_names
Will give you the results you want.
Upvotes: 1