Reputation: 213
Hi my problem is below:
"test"
)test.csv
)read_csv
the saved csv file: pd.read_csv("test.csv")
'[[0. 0. 0. 0.123333. 0.\n 0.]\n
[0. 0. 0.\n 0.123333. 0. 0.]\n
[0. 0.222222. 0. 0.333333. 0. 0.]]'
test = pd.read_csv("test.csv")
np.array(literal_eval(test["vector"][0]))
i get this error
File "<unknown>", line 1
[[0. 0. 0. 0. 0. 0.
^
SyntaxError: invalid syntax
here I linked the download of the file I use. https://drive.google.com/file/d/1MnJjPb-Gj_44dRXUHbNO64b-Z-wSrHSc/view?usp=sharing
code to create vector and put in df
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf_vectorizer = TfidfVectorizer()
tfidf_vectorizer.fit_transform(["example text","this is the list of words","like this"]).toarray()
datadd = [["example text"],["this is the list of words"],["like this"]]
vector = []
for example in datadd:
vector.append(tfidf_vectorizer.transform(example).toarray())
pd.DataFrame({"vector":vector})
pd.to_csv("test.csv")
Upvotes: 0
Views: 1262
Reputation: 149075
A csv file is a plain text file. Just open it with a text editor like notepad++, vi or even notepad if you are using Windows. That means that what is saved in the csv file is, for each cell is just its text representation.
Pandas read_csv
is smart enough to recognize floating point and integer values, but not lists, sets or numpy arrays. For date values, the parse_dates
parameter can help, but AFAIK, nothing exists for numpy arrays. BTW, storing numpy arrays (or lists or other complex objects) in a pandas column is not a very clever idea because pandas will never be able to use its vectorized methods on it. Long story made short, and IMHO, storing complex objects in pandas is miss-using the tools.
Unfortunately, I know no simple way to convert a string representation (as build from str(arr)
) back to the numpy array. So if you want to go that way you will have to write a parser in Python for it, and then apply
it to the pandas column.
Upvotes: 0
Reputation: 62463
vector
is a <class 'scipy.sparse.csr.csr_matrix'>
list
before loading it into the dataframeApply literal_eval
to the entire column when reading the file in.
import pandas as pd
import numpy as np
from ast import literal_eval
# before writing vector to a dataframe
vector = np.array(vector).tolist()
df = pd.DataFrame({"vector": vector})
df.to_csv("test.csv", index=False)
# after reading the csv file in
test = pd.read_csv('test.csv', converters={'vector': literal_eval})
print(type(test.iloc[0, 0]))
>>> <class 'list'>
Upvotes: 1