Reputation: 403
I am creating a ticket tracking project where I have a pandas dataframe holding the ticket information. I am then storing this dataframe into a csv file. The dataframe will initialize at the start of the program.
One of the column values is a list. When you store the pandas dataframe in a csv file with this line of code: self.ticketDF.to_csv(self.ticketCSVFilePath)
, it surrounds the list in double quotes. When you then read it back in, it now is interpreted as a string, not a list. In my example, you can see the list with double quotes under the comments column.
Here is my file - tickets.csv:
Ticket ID,Subject,Project,Description,Priority,Comments
PROT-18,testSubject,testProject,testDescription,testPriority,"['comment1', 'comment2', 'comment3']"
PROT-19,testSubject,testProject,testDescription,testPriority,"['comment4', 'comment5', 'comment6']"
I am initializing the pandas dataframe, using these two functions:
def initializeTicketDF(self):
if path.exists(self.ticketCSVFilePath) and path.getsize(self.ticketCSVFilePath) > 0:
self.ticketDF = pd.read_csv(self.ticketCSVFilePath) #reading the csv file into the dataframe
self.ticketDF.set_index('Ticket ID', inplace=True) #I am setting the index to the Ticket ID
self.columnToList("Comments") #Calling my function that currently does the
#workaround
def columnToList(self, columnName):
count = 0 #this represents the current row in the dataframe
for x in self.ticketDF['Comments']: #x holds the "Comments" value for every row
x = x.replace('[', '') #replace left and right brackets
x = x.replace(']', '')
x = re.findall('\'([^\']*)\'', x) #get a list of all values between single quotes
self.ticketDF[columnName][count] = x #store the list back into the dataframe
count += 1
To work around this issue as shown above, I am replacing each bracket separately and then getting a list of all values between single quotations with this line: x = re.findall('\'([^\']*)\'', x)
I am then storing the list back into the dataframe row by row.
I have also tried using csv.DictReader/Writer and it does the same thing.
Is there a way to read the list in the csv without having to do any string modifications? Is there a regular expression I could use to clean up the list's string?
Any thoughts would be greatly appreciated. Thanks!
Upvotes: 1
Views: 915
Reputation: 92440
You can pass a converted for a column to pd.read_csv()
:
import pandas as pd
from ast import literal_eval
p = pd.read_csv(path, converters={'Comments':literal_eval})
p['Comments']
# 0 [comment1, comment2, comment3]
# 1 [comment4, comment5, comment6]
p['Comments'][0][1]
# 'comment2'
literal_eval
will safely evaluate simple expressions like your list.
Upvotes: 2