Jeff Gruenbaum
Jeff Gruenbaum

Reputation: 403

How do I read a list surrounded in double quotes from a csv file?

I am creating a ticket tracking project where I have a pandas dataframe holding the ticket information. I am then storing this dataframe into a csv file. The dataframe will initialize at the start of the program.

One of the column values is a list. When you store the pandas dataframe in a csv file with this line of code: self.ticketDF.to_csv(self.ticketCSVFilePath), it surrounds the list in double quotes. When you then read it back in, it now is interpreted as a string, not a list. In my example, you can see the list with double quotes under the comments column.

Here is my file - tickets.csv:

Ticket ID,Subject,Project,Description,Priority,Comments
PROT-18,testSubject,testProject,testDescription,testPriority,"['comment1', 'comment2', 'comment3']"
PROT-19,testSubject,testProject,testDescription,testPriority,"['comment4', 'comment5', 'comment6']"

I am initializing the pandas dataframe, using these two functions:

def initializeTicketDF(self):
   if path.exists(self.ticketCSVFilePath) and path.getsize(self.ticketCSVFilePath) > 0:
       self.ticketDF = pd.read_csv(self.ticketCSVFilePath)  #reading the csv file into the dataframe
       self.ticketDF.set_index('Ticket ID', inplace=True)   #I am setting the index to the Ticket ID
       self.columnToList("Comments")                        #Calling my function that currently does the 
                                                            #workaround

def columnToList(self, columnName):
   count = 0                                #this represents the current row in the dataframe
   for x in self.ticketDF['Comments']:      #x holds the "Comments" value for every row
       x = x.replace('[', '')               #replace left and right brackets
       x = x.replace(']', '')
       x = re.findall('\'([^\']*)\'', x)    #get a list of all values between single quotes
       self.ticketDF[columnName][count] = x #store the list back into the dataframe
       count += 1 

To work around this issue as shown above, I am replacing each bracket separately and then getting a list of all values between single quotations with this line: x = re.findall('\'([^\']*)\'', x) I am then storing the list back into the dataframe row by row.

I have also tried using csv.DictReader/Writer and it does the same thing.

Is there a way to read the list in the csv without having to do any string modifications? Is there a regular expression I could use to clean up the list's string?

Any thoughts would be greatly appreciated. Thanks!

Upvotes: 1

Views: 915

Answers (1)

Mark
Mark

Reputation: 92440

You can pass a converted for a column to pd.read_csv():

import pandas as pd
from ast import literal_eval

p = pd.read_csv(path, converters={'Comments':literal_eval})

p['Comments']
# 0    [comment1, comment2, comment3]
# 1    [comment4, comment5, comment6]

p['Comments'][0][1]
# 'comment2'

literal_eval will safely evaluate simple expressions like your list.

Upvotes: 2

Related Questions