samba
samba

Reputation: 3111

Python - how to make CSV data samples without creating files?

I wanted to test concatenating multiple CSV files to make a single Pandas DataFrame:

pd_df = pd.concat(pd.read_csv(f, header=0) for f in csv_files_data)

This resulted in ValueError: Invalid file path or buffer object type: <class 'list'>

I'm creating CSV data samples like this:

csv_data_1 = [['ID', 'Metric_1', 'ProcessDate'],
                      ['1', '-10.5', '1/12/2007'],
                      ['2', '25.0', '1/22/2010']]
csv_data_2 = [['ID', 'Metric_1', 'ProcessDate'],
                      ['3', '7.9', '10/14/2015'],
                      ['4', '50.0', '5/19/2020']]

csv_files_data = [csv_data_1, csv_data_2]

I'm intentionally not reading from csv files and tried to create data samples in test code. Is there a way to correctly create such CSV samples that I can pass to pd.read_csv?

Upvotes: 1

Views: 172

Answers (2)

MrBean Bremen
MrBean Bremen

Reputation: 16825

You could convert your lists to a conforming csv string manually, then write them into an io stream:

import io
import pandas as pd


def lists_to_csv(lists):
    """Make a comma separated string from each list, 
    then join the strings with a newline"""
    lines = '\n'.join([','.join([el for el in row]) for row in lists])
    return io.StringIO(lines)

csv_data_1 = [['ID', 'Metric_1', 'ProcessDate'],
              ['1', '-10.5', '1/12/2007'],
              ['2', '25.0', '1/22/2010']]
csv_data_2 = [['ID', 'Metric_1', 'ProcessDate'],
              ['3', '7.9', '10/14/2015'],
              ['4', '50.0', '5/19/2020']]

csv_files_data = [list_to_csv(data) for data in (csv_data_1, csv_data_2)]

pd_df = pd.concat(pd.read_csv(f, header=0) for f in csv_files_data)
print(pd_df)

This outputs:

   ID  Metric_1 ProcessDate
0   1     -10.5   1/12/2007
1   2      25.0   1/22/2010
0   3       7.9  10/14/2015
1   4      50.0   5/19/2020

Upvotes: 2

panadestein
panadestein

Reputation: 1291

Would this code fit your needs?

pd_df = pd.concat(pd.DataFrame(f) for f in csv_files_data)

The method read_cvs works with file objects or buffers.

Edit:

You could dump your lists into a data file object and if you don't mind to use numpy, then this can be a solution:

from tempfile import TemporaryFile

fil_data_1 = TemporaryFile()
fil_data_2 = TemporaryFile()

csv_data_1 = np.array(csv_data_1)
csv_data_2 = np.array(csv_data_2)

np.savetxt(fil_data_1, csv_data_1, fmt='%s %s %s')
np.savetxt(fil_data_2, csv_data_2, fmt='%s %s %s')

# Simulate closing and reopening of files
_ = fil_data_1.seek(0)
_ = fil_data_2.seek(0)

pd_df = pd.concat(pd.read_csv(f, header=0) for f in [fil_data_1, fil_data_2])

The code above generates temporary files with the module tempfile in which one can dump the numpy arrays generated from your lists. The corresponding output is:

      ID    Metric_1    ProcessDate
0     1     -10.5           1/12/2007
1     2      25.0           1/22/2010
0     3      7.9            10/14/2015
1     4      50.0           5/19/2020

Upvotes: 2

Related Questions