Pandas save and open then values changed to be string problem

Question

Hi my problem is below:

compute some vectors.
put them in a column in pandas dataframe (column name is "test")
save the dataframe as csv. (test.csv)
read_csv the saved csv file: pd.read_csv("test.csv")
realizing that the vectors are not numpy array but strings like below.

  '[[0.   0.   0.   0.123333.   0.
    0.]

    [0.   0.   0.
   0.123333.   0.    0.]

    [0.   0.222222.   0.   0.333333.   0.    0.]]'

I tried something like this to solve the problem.

  test = pd.read_csv("test.csv")    
  np.array(literal_eval(test["vector"][0]))

i get this error

     File "", line 1
        [[0.         0.         0.         0.         0.         0.
                      ^
    SyntaxError: invalid syntax

here I linked the download of the file I use. https://drive.google.com/file/d/1MnJjPb-Gj_44dRXUHbNO64b-Z-wSrHSc/view?usp=sharing

code to create vector and put in df

    from sklearn.feature_extraction.text import TfidfVectorizer
    tfidf_vectorizer = TfidfVectorizer()
    tfidf_vectorizer.fit_transform(["example text","this is the list of words","like this"]).toarray()


    datadd = [["example text"],["this is the list of words"],["like this"]]
    vector = []
    for example in datadd:
        vector.append(tfidf_vectorizer.transform(example).toarray())
    pd.DataFrame({"vector":vector})
    pd.to_csv("test.csv")

Trenton McKinney · Accepted Answer

vector is a
- convert it to a list before loading it into the dataframe
Apply literal_eval to the entire column when reading the file in.

import pandas as pd
import numpy as np
from ast import literal_eval

# before writing vector to a dataframe
vector  = np.array(vector).tolist()
df = pd.DataFrame({"vector": vector})
df.to_csv("test.csv", index=False)

# after reading the csv file in
test = pd.read_csv('test.csv', converters={'vector': literal_eval})
print(type(test.iloc[0, 0]))
>>>

Pandas save and open then values changed to be string problem

Answers (2)

Related Questions