user10553396
user10553396

Reputation:

Numpy: load multiple CSV files as dictionaries

I wanted to use the numpy loadtxt method to read .csv files for my experiment. I have three different time-series data of the following format with different characteristics where the first column is timestamp and the second column is the value.

0.086206438,10
0.086425551,12
0.089227066,20
0.089262508,24
0.089744425,30
0.090036815,40
0.090054172,28
0.090377569,28
0.090514071,28
0.090762872,28
0.090912691,27

For reproducibility, I have shared the three time-series data I am using here.

If I do it like the following

import numpy as np

fname="data1.csv"

col_time,col_window = np.loadtxt(fname,delimiter=',').T

It works fine as intended. However instead of reading only a single file, I want to pass a dictionary to col_time,col_window = np.loadtxt(types,delimiter=',').T as the following

protocols = {} types = {"data1": "data1.csv", "data2": "data2.csv", "data3": "data3.csv"}

so that I can read multiple csv files and do plot all the results at ones using a one for loop as in the following.

for protname, fname in types.items():
    col_time, col_window = protocols[protname]["col_time"], protocols[protname]["col_window"]
    rt = np.exp(np.diff(np.log(col_window)))
    plt.plot(quotient_times, quotient, ".", markersize=4, label=protname)
    plt.title(protname)
    plt.xlabel("t")
    plt.ylabel("values")
    plt.legend()
    plt.show()

But it is giving me an error ValueError: could not convert string to float: b'data1'. How can I load multiple csv files as a dictionary?

Upvotes: 1

Views: 1052

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 148890

Assuming that you want to build a protocols dict that will be useable in your code, you can easily build it with a simple loop:

types = {"data1": "data1.csv", "data2": "data2.csv", "data3": "data3.csv"}
protocols = {}

for name, file in types.items():
    col_time, col_window = np.loadtxt(file, delimiter=',').T
    protocols[name] = {'col_time': col_time, 'col_window': col_window}

You can then successfully plot the 3 graphs:

for protname, fname in types.items():
    col_time, col_window = protocols[protname]["col_time"], protocols[protname]["col_window"]
    rt = np.exp(np.diff(np.log(col_window)))
    plt.plot(col_time, col_window, ".", markersize=4, label=protname)
    plt.title(protname)
    plt.xlabel("t")
    plt.ylabel("values")
    plt.legend()
    plt.show()

Upvotes: 1

MUNGAI NJOROGE
MUNGAI NJOROGE

Reputation: 1216

Loading data from multiple CSV files is not supported in pandas and numpy. You can use concat function of pandas DataFrame and load all the files. The example bellow demonstrates using pandas. Replace StringIO with file object.

data="""
0.086206438,10
0.086425551,12
0.089227066,20
0.089262508,24
0.089744425,30
0.090036815,40
0.090054172,28
0.090377569,28
0.090514071,28
0.090762872,28
0.090912691,27
"""
data2="""
0.086206438,29
0.086425551,32
0.089227066,50
0.089262508,54
"""
data3="""
0.086206438,69
0.086425551,72
0.089227066,70
0.089262508,74
"""
import pandas as pd
from io import StringIO
files={"data1":data,"data2":data2,"data3":data3}
# Load the first file into data frame
key=list(files.keys())[0]
df=pd.read_csv(StringIO(files.get(key)),header=None,usecols=[0,1],names=['data1','data2'])
print(df.head())
# remove file from dictionary
files.pop(key,None)
print("final values")
# Efficient :Concat this dataframe with remaining files
df=pd.concat([pd.read_csv(StringIO(files[i]),header=None,usecols=[0,1],names=['data1','data2']) for i in files.keys()],
           ignore_index=True)
print(df.tail())

For more insight: pandas append vs concat

Upvotes: 0

Related Questions