J. McCraiton
J. McCraiton

Reputation: 159

Create Pandas dataframe using List

Im trying to place a list that I created from reading in a textfile into a pandas dataframe but its not working for some reason. Below you will find some test data and my functions. The first piece of code does some checking and splitting and the second part appends it to a list called data. Here is some test data

product/productId: B001E4KFG0
review/userId: A3SGXH7AUHU8GW
review/profileName: delmartian
review/helpfulness: 1/1
review/score: 5.0
review/time: 1303862400
review/summary: Good Quality Dog Food
review/text: I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most.

product/productId: B00813GRG4
review/userId: A1D87F6ZCVE5NK
review/profileName: dll pa
review/helpfulness: 0/0
review/score: 1.0
review/time: 1346976000
review/summary: Not as Advertised
review/text: Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as "Jumbo".

Here is my code:

import pandas as pd
import numpy as np

def grab_next_entry(food_file):
    record={'id':-1,'helpfulness':'','number rated':'','score':'','review':''}
    line=food_file.readline()


    #food_dataframe=pd.DataFrame(columns=column_names)

    while line:
        if 'product/productId' in line:
            split_product_id=line.split(':')
            record['id']=split_product_id[1]

        if 'review/helpfulness' in line:
            split_helpfulness=line.split(':')
            split_helpfulness=split_helpfulness[1].split('/')
            record['helpfulness']=eval(split_helpfulness[0])
            record['number rated']=eval(split_helpfulness[-1])

        if 'review/score' in line:
            split_score = line.split(':')
            record['score']=eval(split_score[1])

        if 'review/text' in line:
            split_review_text=line.split('review/text:')
            record['review']=split_review_text[1:]

        if line=='\n':
            return record
        line=food_file.readline()

The next piece of code is creating the list and trying to put it into a pandas dataframe.

import os

fileLoc = "/Users/brawdyll/Documents/ds710fall2017assignment11/finefoods_test.txt"
column_names=('Product ID', 'People who voted Helpful','Total votes','Rating','Review')
food_dataframe=[]
data=[]
with open(fileLoc,encoding = "ISO 8859-1") as food_file:
    fs=os.fstat(food_file.fileno()).st_size
    num_read = 0
    while not food_file.tell()==fs:
        data.append(grab_next_entry(food_file))
        num_read+=1

Food_dataframe = pd.DataFrame(data,column_names)

print(Food_dataframe)

Upvotes: 0

Views: 78

Answers (1)

Sebastian Mendez
Sebastian Mendez

Reputation: 2981

There's a lot of improvements that could be made in this code, but the reason why your program isn't working is because you're setting the indices to be column_names. Running:

pd.DataFrame(data)

will work just fine, and then setting:

df.columns = column_names

Will give you the results you want.

Upvotes: 1

Related Questions