Reputation: 3111
I wanted to test concatenating multiple CSV files to make a single Pandas DataFrame:
pd_df = pd.concat(pd.read_csv(f, header=0) for f in csv_files_data)
This resulted in ValueError: Invalid file path or buffer object type: <class 'list'>
I'm creating CSV data samples like this:
csv_data_1 = [['ID', 'Metric_1', 'ProcessDate'],
['1', '-10.5', '1/12/2007'],
['2', '25.0', '1/22/2010']]
csv_data_2 = [['ID', 'Metric_1', 'ProcessDate'],
['3', '7.9', '10/14/2015'],
['4', '50.0', '5/19/2020']]
csv_files_data = [csv_data_1, csv_data_2]
I'm intentionally not reading from csv files and tried to create data samples in test code. Is there a way to correctly create such CSV samples that I can pass to pd.read_csv
?
Upvotes: 1
Views: 172
Reputation: 16825
You could convert your lists to a conforming csv string manually, then write them into an io
stream:
import io
import pandas as pd
def lists_to_csv(lists):
"""Make a comma separated string from each list,
then join the strings with a newline"""
lines = '\n'.join([','.join([el for el in row]) for row in lists])
return io.StringIO(lines)
csv_data_1 = [['ID', 'Metric_1', 'ProcessDate'],
['1', '-10.5', '1/12/2007'],
['2', '25.0', '1/22/2010']]
csv_data_2 = [['ID', 'Metric_1', 'ProcessDate'],
['3', '7.9', '10/14/2015'],
['4', '50.0', '5/19/2020']]
csv_files_data = [list_to_csv(data) for data in (csv_data_1, csv_data_2)]
pd_df = pd.concat(pd.read_csv(f, header=0) for f in csv_files_data)
print(pd_df)
This outputs:
ID Metric_1 ProcessDate
0 1 -10.5 1/12/2007
1 2 25.0 1/22/2010
0 3 7.9 10/14/2015
1 4 50.0 5/19/2020
Upvotes: 2
Reputation: 1291
Would this code fit your needs?
pd_df = pd.concat(pd.DataFrame(f) for f in csv_files_data)
The method read_cvs works with file objects or buffers.
You could dump your lists into a data file object and if you don't mind to use numpy, then this can be a solution:
from tempfile import TemporaryFile
fil_data_1 = TemporaryFile()
fil_data_2 = TemporaryFile()
csv_data_1 = np.array(csv_data_1)
csv_data_2 = np.array(csv_data_2)
np.savetxt(fil_data_1, csv_data_1, fmt='%s %s %s')
np.savetxt(fil_data_2, csv_data_2, fmt='%s %s %s')
# Simulate closing and reopening of files
_ = fil_data_1.seek(0)
_ = fil_data_2.seek(0)
pd_df = pd.concat(pd.read_csv(f, header=0) for f in [fil_data_1, fil_data_2])
The code above generates temporary files with the module tempfile in which one can dump the numpy arrays generated from your lists. The corresponding output is:
ID Metric_1 ProcessDate
0 1 -10.5 1/12/2007
1 2 25.0 1/22/2010
0 3 7.9 10/14/2015
1 4 50.0 5/19/2020
Upvotes: 2