Getting data arrays from CSV with loops

Question

I have a CSV that looks like this:

0.500187550,CPU1,7.93
0.500187550,CPU2,1.62
0.500187550,CPU3,7.93
0.500187550,CPU4,1.62
1.000445359,CPU1,9.96
1.000445359,CPU2,1.61
1.000445359,CPU3,9.96
1.000445359,CPU4,1.61
1.500674877,CPU1,9.94
1.500674877,CPU2,1.61
1.500674877,CPU3,9.94
1.500674877,CPU4,1.61

The first column is time, the second the CPU used and the third is energy.

As a final result I would like to have these arrays:

Time:

[0.500187550, 1.000445359, 1.500674877]

Energy (per CPU): e.g. CPU1

[7.93, 9.96, 9.94]

For parsing the CSV I'm using:

query = csv.reader(csvfile, delimiter=',', skipinitialspace=True)
#Arrays global time and power:
for row in query:
    x = row[0]
    x = float(x)
    x_array.append(x) #column 0 to array
    y = row[2]
    y = float(y)
    y_array.append(y) #column 2 to array
print x_array
print y_array

These way I get all the data from time and energy into two arrays: x_array and y_array.

Then I order the arrays:

energy_core_ord_array = []
time_ord_array = []
#Dividing array into energy and time per core:
for i in range(number_cores[0]):
    e =  0 + i
    for j in range(len(x_array)/(int(number_cores[0]))):
        time_ord = x_array[e]
        time_ord_array.append(time_ord)
        energy_core_ord = y_array[e]
        energy_core_ord_array.append(energy_core_ord)
        e = e + int(number_cores[0])

And lastly, I cut the time array into the lenghts it should have:

final_time_ord_array = []
for i in range(len(x_array)/(int(number_cores[0]))):
    final_time_ord = time_ord_array[i]
    final_time_ord_array.append(final_time_ord)

Till here, although the code is not elegant, it works. The problem comes when I try to get the array for each core.

I get it for the first core, but when I try to iterate for the next one, I don´t know how to do it, and how can I store each array in a variable with a single name for example.

final_energy_core_ord_array = []
#Trunk energy core array:
for i in range(len(x_array)/(int(number_cores[0]))):
    final_energy_core_ord = energy_core_ord_array[i]
    final_energy_core_ord_array.append(final_energy_core_ord)

Simon · Accepted Answer

So using Pandas (library to handle dataframes in Python) you can do something like this, which is much quicker than trying to process the CSV manually like you're doing:

import pandas as pd

csvfile = "C:/Users/Simon/Desktop/test.csv"

data = pd.read_csv(csvfile, header=None, names=['time','cpu','energy'])

times = list(pd.unique(data.time.ravel()))

print times

cpuList = data.groupby(['cpu'])

cpuEnergy = {}

for i in range(len(cpuList)):
    curCPU = 'CPU' + str(i+1)
    cpuEnergy[curCPU] = list(cpuList.get_group('CPU' + str(i+1))['energy'])

for k, v in cpuEnergy.items():
    print k, v

that will give the following as output:

[0.50018755000000004, 1.000445359, 1.5006748769999998]
CPU4 [1.6200000000000001, 1.6100000000000001, 1.6100000000000001]
CPU2 [1.6200000000000001, 1.6100000000000001, 1.6100000000000001]
CPU3 [7.9299999999999997, 9.9600000000000009, 9.9399999999999995]
CPU1 [7.9299999999999997, 9.9600000000000009, 9.9399999999999995]

Getting data arrays from CSV with loops

Answers (2)

Related Questions