Reputation: 319
I have 8 .csv files with 16 columns and "n" rows with no Header. I want to parse each of these .csv and get column [0, 8] [#where column 0 is the value of x, y, z etc and column 8 is always the value of a#] and put the data into a new.csv. Once this is done, the new.csv should have 16 columns (2 from each input.csv) and "n" rows.
Now, I want to just take the average of Column[1,3,5,7,9,11,13,15] from new.csv and append it to another file or this file. Basically in the new csv, i want to have the average of colum[8] form the input files and each of the column[0] from the input files. So the final output expecting should have a shape of 9 columns and n rows. sample input files:
a.csv:
5.42E+05 6.52E+05 2.17E+04 2.73E+04 2.58E+04 2.33E+04 2.81E+04 3.37E+04 1.08E+08 1.10E+08 2.54E+05 3.21E+05 2.99E+05 2.74E+05 3.39E+05 4.07E+05
4.64E+04 1.15E+06 1.96E+04 2.53E+04 2.39E+04 2.37E+04 1.98E+04 2.85E+04 6.18E+05 2.17E+08 2.30E+05 3.02E+05 2.75E+05 2.77E+05 2.33E+05 3.42E+05
4.36E+04 1.13E+06 5.72E+04 2.71E+04 2.77E+04 2.37E+04 2.62E+04 7.35E+04 5.78E+05 2.17E+08 9.26E+05 3.25E+05 3.20E+05 2.72E+05 3.20E+05 1.46E+06
4.32E+04 1.02E+06 1.47E+05 2.63E+04 3.05E+04 2.26E+04 2.89E+04 2.45E+04 5.70E+05 2.15E+08 2.78E+06 3.02E+05 3.58E+05 2.63E+05 3.49E+05 2.87E+05
4.44E+04 7.83E+05 1.58E+05 2.95E+04 2.71E+05 2.71E+04 3.67E+04 2.85E+04 5.86E+05 1.61E+08 2.89E+06 3.48E+05 5.39E+07 3.14E+05 4.49E+05 3.39E+05
1.47E+05 1.02E+06 2.09E+04 2.72E+04 2.66E+04 6.18E+04 3.50E+04 3.00E+04 2.72E+06 2.15E+08 2.46E+05 3.18E+05 3.07E+05 9.91E+05 7.18E+05 3.71E+05
1.81E+05 7.67E+05 1.94E+04 5.05E+04 2.62E+04 4.50E+04 2.92E+04 2.86E+04 3.16E+06 1.61E+08 2.28E+05 4.84E+06 3.10E+05 5.31E+06 3.49E+05 3.58E+05
4.94E+05 1.34E+05 6.99E+04 8.76E+05 5.51E+04 5.27E+04 3.34E+05 1.30E+05 1.35E+07 3.59E+06 1.66E+06 1.64E+08 1.03E+06 1.12E+06 5.56E+07 3.37E+06
4.79E+04 1.38E+05 2.66E+04 1.02E+06 2.85E+04 2.88E+04 2.89E+04 3.26E+04 6.12E+05 2.72E+06 3.21E+05 2.15E+08 3.29E+05 3.39E+05 3.40E+05 4.04E+05
4.51E+04 6.44E+05 3.02E+04 5.24E+05 2.72E+04 1.89E+04 2.42E+04 3.21E+04 5.97E+05 1.10E+08 3.65E+05 1.07E+08 3.17E+05 2.17E+05 2.85E+05 3.80E+05
b.csv:
4.25E+03 1.83E+03 1.09E+03 1.35E+03 1.18E+03 1.24E+03 1.16E+03 1.28E+03 1.08E+08 1.10E+08 2.51E+05 3.13E+05 2.80E+05 2.64E+05 3.23E+05 3.32E+05
4.47E+03 2.20E+03 1.16E+03 1.46E+03 1.28E+03 1.21E+03 1.17E+03 1.36E+03 6.01E+05 2.17E+08 2.92E+05 3.59E+05 3.34E+05 2.84E+05 3.14E+05 3.86E+05
5.12E+03 1.85E+03 1.62E+03 1.59E+03 1.93E+03 1.36E+03 1.36E+03 1.42E+03 7.19E+05 2.16E+08 1.60E+06 7.14E+06 7.10E+05 8.74E+05 8.67E+05 1.37E+06
4.32E+03 1.53E+03 2.03E+03 1.11E+03 1.18E+03 1.18E+03 1.52E+03 1.18E+03 5.81E+05 2.15E+08 2.70E+06 2.84E+05 3.24E+05 3.12E+05 4.25E+05 3.65E+05
4.64E+03 1.53E+03 2.07E+03 1.15E+03 1.15E+03 1.25E+03 1.50E+03 1.13E+03 1.17E+06 1.61E+08 2.74E+06 2.98E+05 2.82E+05 5.38E+07 4.16E+05 3.41E+05
5.03E+03 1.61E+03 1.17E+03 1.15E+03 1.02E+03 1.12E+03 1.40E+03 1.43E+03 2.56E+06 2.16E+08 2.37E+05 2.57E+05 2.43E+05 2.65E+05 4.03E+05 4.43E+05
5.11E+03 1.37E+03 1.24E+03 1.20E+03 1.21E+03 1.10E+03 1.28E+03 1.34E+03 3.09E+06 1.61E+08 2.84E+05 2.93E+05 2.91E+05 2.34E+05 5.40E+07 3.07E+05
5.79E+03 2.51E+03 2.15E+03 2.21E+03 3.57E+03 1.67E+03 2.61E+03 2.28E+03 3.08E+06 4.98E+06 3.60E+06 1.63E+08 7.06E+06 1.95E+06 5.74E+07 3.44E+06
4.49E+03 1.88E+03 1.22E+03 1.47E+03 1.23E+03 1.04E+03 1.42E+03 1.37E+03 6.11E+05 2.67E+06 2.93E+05 2.15E+08 3.31E+05 2.26E+05 4.13E+05 3.53E+05
4.50E+03 2.22E+03 1.40E+03 1.34E+03 1.26E+03 1.22E+03 1.18E+03 1.35E+03 6.43E+05 1.10E+08 3.31E+05 1.07E+08 3.50E+05 3.29E+05 3.69E+05 4.26E+05
c.csv:
1.30E+06 4.34E+05 4.66E+04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.62E+08 5.65E+07 6.02E+06 3.24E+05 3.55E+05 2.83E+05 3.41E+05 4.05E+05
0.00E+00 1.74E+06 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 5.61E+05 2.17E+08 3.12E+05 3.34E+05 2.83E+05 2.83E+05 3.01E+05 3.45E+05
0.00E+00 1.74E+06 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 6.08E+05 2.17E+08 8.92E+05 3.47E+05 3.43E+05 2.22E+05 3.64E+05 2.38E+05
0.00E+00 1.74E+06 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 5.61E+05 2.15E+08 2.90E+06 3.35E+05 3.08E+05 5.85E+05 3.60E+05 3.81E+05
0.00E+00 1.74E+06 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 5.45E+05 2.15E+08 2.90E+06 3.11E+05 3.06E+05 2.88E+05 3.73E+05 3.10E+05
0.00E+00 1.30E+06 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 9.22E+04 4.90E+06 1.65E+08 8.92E+05 3.07E+06 1.37E+06 3.40E+06 1.53E+06 1.52E+07
0.00E+00 1.74E+06 0.00E+00 4.69E+04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 3.09E+06 2.15E+08 3.08E+05 6.15E+06 3.48E+05 3.63E+05 3.85E+05 4.12E+05
0.00E+00 0.00E+00 0.00E+00 1.31E+06 0.00E+00 0.00E+00 4.36E+05 0.00E+00 3.06E+06 1.35E+06 2.31E+05 1.61E+08 2.89E+05 2.05E+05 5.41E+07 1.77E+06
0.00E+00 0.00E+00 0.00E+00 1.74E+06 0.00E+00 0.00E+00 0.00E+00 0.00E+00 6.69E+05 2.27E+06 3.02E+05 2.16E+08 3.27E+05 3.08E+05 3.50E+05 3.75E+05
0.00E+00 8.69E+05 0.00E+00 8.71E+05 0.00E+00 0.00E+00 0.00E+00 0.00E+00 6.68E+05 1.10E+08 3.07E+05 1.08E+08 3.67E+05 2.34E+05 3.71E+05 3.78
Final expected output (after averaging column 8):
5.42E+05 4.25E+03 1.30E+06 125650487
4.64E+04 4.47E+03 0.00E+00 593233.3333
4.36E+04 5.12E+03 0.00E+00 634780
4.32E+04 4.32E+03 0.00E+00 570865
4.44E+04 4.64E+03 0.00E+00 766418
1.47E+05 5.03E+03 0.00E+00 3393342.667
1.81E+05 5.11E+03 0.00E+00 3113608.333
4.94E+05 5.79E+03 0.00E+00 6532673.333
4.79E+04 4.49E+03 0.00E+00 630900.3333
4.51E+04 4.50E+03 0.00E+00 636023
Then I was to do the looping for all the 16 columns (taking the following sets as[n,n+8] where n=0 to 7.
Apologize for the lengthy description, but I can't seem to append the columns in python. Thanks in advance.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Here is a sample code I started with:
import csv
import numpy as np
import sys
import pandas as pd
import glob
damn = ("a", "b", "c","e","f","g","h","i")
data = []
for fles in range(len(damn)):
core0data = np.genfromtxt('./%s_raw_combine.csv'%damn[fles], dtype=float, delimiter=',',usecols=(0,8))
#core1data = np.genfromtxt('./%s_raw_combine.csv'%damn[fles], dtype=float, delimiter=',',usecols=(1,9))
#core2data = np.genfromtxt('./%s_raw_combine.csv'%damn[fles], dtype=float, delimiter=',',usecols=(2,10))
#core3data = np.genfromtxt('./%s_raw_combine.csv'%damn[fles], dtype=float, delimiter=',',usecols=(3,11))
#core4data = np.genfromtxt('./%s_raw_combine.csv'%damn[fles], dtype=float, delimiter=',',usecols=(4,12))
#core5data = np.genfromtxt('./%s_raw_combine.csv'%damn[fles], dtype=float, delimiter=',',usecols=(5,13))
#core6data = np.genfromtxt('./%s_raw_combine.csv'%damn[fles], dtype=float, delimiter=',',usecols=(6,14))
#core7data = np.genfromtxt('./%s_raw_combine.csv'%damn[fles], dtype=float, delimiter=',',usecols=(7,15))
data.append(core0data)
np.savetxt("writer.csv", data, delimiter= ",")
But after running this, I get error:
> python2.7 new.py
Traceback (most recent call last):
File "new.py", line 20, in <module>
np.savetxt("writer.csv", data, delimiter= ",")
File "~/anaconda/lib/python2.7/site-packages/numpy/lib/npyio.py", line 1083, in savetxt
fh.write(asbytes(format % tuple(row) + newline))
TypeError: float argument required, not numpy.ndarray
Upvotes: 1
Views: 918
Reputation: 180411
This will join the rows and write them to a new csv:
import csv
from glob import iglob
from itertools import chain
data = []
for file in iglob("*.csv"):
with open(file) as f:
r = csv.reader(f)
data.append(list(chain.from_iterable((float(row[0]), float(row[8])) for row in r)))
with open("new.csv","w") as out:
wr = csv.writer(out)
wr.writerows(zipped)
Passing the data to pandas:
data = []
for file in iglob("*.csv"):
with open(file) as f:
r = csv.reader(f)
data.append(list(chain.from_iterable((float(row[0]), float(row[8])) for row in r)))
zipped = zip(*data)
import pandas as pd
df = pd.DataFrame(zipped)
print(df[0].mean())
print(df[1].mean())
print(df[2].mean())
print(df)
Output:
69610.0
3093.0
103830.0
0 1 2
0 1300000 4250 542000
1 162000000 108000000 108000000
2 0 4470 46400
3 561000 601000 618000
4 0 5120 43600
5 608000 719000 578000
6 0 4320 43200
7 561000 581000 570000
8 0 4640 44400
9 545000 1170000 586000
10 0 5030 147000
11 4900000 2560000 2720000
12 0 5110 181000
13 3090000 3090000 3160000
14 0 5790 494000
15 3060000 3080000 13500000
16 0 4490 47900
17 669000 611000 612000
18 0 4500 45100
19 668000 643000 597000
get the mean of each row:
print(df.mean(1))
Output:
0 615416.666667
1 11660.000000
2 16956.666667
3 9953.333333
4 16240.000000
5 24973.333333
6 15840.000000
7 8560.000000
8 16346.666667
9 9876.666667
10 50676.666667
11 41210.000000
12 62036.666667
13 9980.000000
14 166596.666667
15 44093.333333
16 17463.333333
17 11323.333333
18 16533.333333
19 11150.000000
dtype: float64
Add that column:
df[3] = df.mean(1)
print(df)
Output:
0 1 2 3
0 1300000 4250 542000 615416.666667
1 0 1280 33700 11660.000000
2 0 4470 46400 16956.666667
3 0 1360 28500 9953.333333
4 0 5120 43600 16240.000000
5 0 1420 73500 24973.333333
6 0 4320 43200 15840.000000
7 0 1180 24500 8560.000000
8 0 4640 44400 16346.666667
9 0 1130 28500 9876.666667
10 0 5030 147000 50676.666667
11 92200 1430 30000 41210.000000
12 0 5110 181000 62036.666667
13 0 1340 28600 9980.000000
14 0 5790 494000 166596.666667
15 0 2280 130000 44093.333333
16 0 4490 47900 17463.333333
17 0 1370 32600 11323.333333
18 0 4500 45100 16533.333333
19 0 1350 32100 11150.000000
Save to csv:
df.to_("new.csv",sep=" ")
Output:
0 1 2 3
0 1300000.0 4250.0 542000.0 615416.6666666666
1 0.0 1280.0 33700.0 11660.0
2 0.0 4470.0 46400.0 16956.666666666668
3 0.0 1360.0 28500.0 9953.333333333334
4 0.0 5120.0 43600.0 16240.0
5 0.0 1420.0 73500.0 24973.333333333332
6 0.0 4320.0 43200.0 15840.0
7 0.0 1180.0 24500.0 8560.0
8 0.0 4640.0 44400.0 16346.666666666666
9 0.0 1130.0 28500.0 9876.666666666666
10 0.0 5030.0 147000.0 50676.666666666664
11 92200.0 1430.0 30000.0 41210.0
12 0.0 5110.0 181000.0 62036.666666666664
13 0.0 1340.0 28600.0 9980.0
14 0.0 5790.0 494000.0 166596.66666666666
15 0.0 2280.0 130000.0 44093.333333333336
16 0.0 4490.0 47900.0 17463.333333333332
17 0.0 1370.0 32600.0 11323.333333333334
18 0.0 4500.0 45100.0 16533.333333333332
19 0.0 1350.0 32100.0 11150.0
If you don't want the names and row indexes:
df.to_csv("new.csv",sep=" ",index=False,header=False)
Output:
1300000.0 4250.0 542000.0 615416.6666666666
0.0 1280.0 33700.0 11660.0
0.0 4470.0 46400.0 16956.666666666668
0.0 1360.0 28500.0 9953.333333333334
0.0 5120.0 43600.0 16240.0
0.0 1420.0 73500.0 24973.333333333332
0.0 4320.0 43200.0 15840.0
0.0 1180.0 24500.0 8560.0
0.0 4640.0 44400.0 16346.666666666666
0.0 1130.0 28500.0 9876.666666666666
0.0 5030.0 147000.0 50676.666666666664
92200.0 1430.0 30000.0 41210.0
0.0 5110.0 181000.0 62036.666666666664
0.0 1340.0 28600.0 9980.0
0.0 5790.0 494000.0 166596.66666666666
0.0 2280.0 130000.0 44093.333333333336
0.0 4490.0 47900.0 17463.333333333332
0.0 1370.0 32600.0 11323.333333333334
0.0 4500.0 45100.0 16533.333333333332
0.0 1350.0 32100.0 11150.0
Upvotes: 2
Reputation: 3009
Try out pandas. Among the many benefits, will be the explicit column labeling and column slicing.
Haven't fully replicated but maybe we can work to carve a full pandas solution.
import pandas as pd
filenames = ['a.csv','b.csv','c.csv']
for i,filename in enumerate(filenames):
df = pd.read_csv(filename,header=None)
df.columns = df.columns + 1 #
filenames[i] = df
dfs = pd.concat(filenames,axis=1)
print dfs.loc[:,[1,8]].head()
cols = [1,3]
print dfs[cols].mean()
Upvotes: 0