user3285014
user3285014

Reputation: 319

Append to a new csv from other csv columns in python

I have 8 .csv files with 16 columns and "n" rows with no Header. I want to parse each of these .csv and get column [0, 8] [#where column 0 is the value of x, y, z etc and column 8 is always the value of a#] and put the data into a new.csv. Once this is done, the new.csv should have 16 columns (2 from each input.csv) and "n" rows.

Now, I want to just take the average of Column[1,3,5,7,9,11,13,15] from new.csv and append it to another file or this file. Basically in the new csv, i want to have the average of colum[8] form the input files and each of the column[0] from the input files. So the final output expecting should have a shape of 9 columns and n rows. sample input files:

a.csv:
5.42E+05    6.52E+05    2.17E+04    2.73E+04    2.58E+04    2.33E+04    2.81E+04    3.37E+04    1.08E+08    1.10E+08    2.54E+05    3.21E+05    2.99E+05    2.74E+05    3.39E+05    4.07E+05
4.64E+04    1.15E+06    1.96E+04    2.53E+04    2.39E+04    2.37E+04    1.98E+04    2.85E+04    6.18E+05    2.17E+08    2.30E+05    3.02E+05    2.75E+05    2.77E+05    2.33E+05    3.42E+05
4.36E+04    1.13E+06    5.72E+04    2.71E+04    2.77E+04    2.37E+04    2.62E+04    7.35E+04    5.78E+05    2.17E+08    9.26E+05    3.25E+05    3.20E+05    2.72E+05    3.20E+05    1.46E+06
4.32E+04    1.02E+06    1.47E+05    2.63E+04    3.05E+04    2.26E+04    2.89E+04    2.45E+04    5.70E+05    2.15E+08    2.78E+06    3.02E+05    3.58E+05    2.63E+05    3.49E+05    2.87E+05
4.44E+04    7.83E+05    1.58E+05    2.95E+04    2.71E+05    2.71E+04    3.67E+04    2.85E+04    5.86E+05    1.61E+08    2.89E+06    3.48E+05    5.39E+07    3.14E+05    4.49E+05    3.39E+05
1.47E+05    1.02E+06    2.09E+04    2.72E+04    2.66E+04    6.18E+04    3.50E+04    3.00E+04    2.72E+06    2.15E+08    2.46E+05    3.18E+05    3.07E+05    9.91E+05    7.18E+05    3.71E+05
1.81E+05    7.67E+05    1.94E+04    5.05E+04    2.62E+04    4.50E+04    2.92E+04    2.86E+04    3.16E+06    1.61E+08    2.28E+05    4.84E+06    3.10E+05    5.31E+06    3.49E+05    3.58E+05
4.94E+05    1.34E+05    6.99E+04    8.76E+05    5.51E+04    5.27E+04    3.34E+05    1.30E+05    1.35E+07    3.59E+06    1.66E+06    1.64E+08    1.03E+06    1.12E+06    5.56E+07    3.37E+06
4.79E+04    1.38E+05    2.66E+04    1.02E+06    2.85E+04    2.88E+04    2.89E+04    3.26E+04    6.12E+05    2.72E+06    3.21E+05    2.15E+08    3.29E+05    3.39E+05    3.40E+05    4.04E+05
4.51E+04    6.44E+05    3.02E+04    5.24E+05    2.72E+04    1.89E+04    2.42E+04    3.21E+04    5.97E+05    1.10E+08    3.65E+05    1.07E+08    3.17E+05    2.17E+05    2.85E+05    3.80E+05


b.csv:
    4.25E+03    1.83E+03    1.09E+03    1.35E+03    1.18E+03    1.24E+03    1.16E+03    1.28E+03    1.08E+08    1.10E+08    2.51E+05    3.13E+05    2.80E+05    2.64E+05    3.23E+05    3.32E+05
    4.47E+03    2.20E+03    1.16E+03    1.46E+03    1.28E+03    1.21E+03    1.17E+03    1.36E+03    6.01E+05    2.17E+08    2.92E+05    3.59E+05    3.34E+05    2.84E+05    3.14E+05    3.86E+05
    5.12E+03    1.85E+03    1.62E+03    1.59E+03    1.93E+03    1.36E+03    1.36E+03    1.42E+03    7.19E+05    2.16E+08    1.60E+06    7.14E+06    7.10E+05    8.74E+05    8.67E+05    1.37E+06
    4.32E+03    1.53E+03    2.03E+03    1.11E+03    1.18E+03    1.18E+03    1.52E+03    1.18E+03    5.81E+05    2.15E+08    2.70E+06    2.84E+05    3.24E+05    3.12E+05    4.25E+05    3.65E+05
    4.64E+03    1.53E+03    2.07E+03    1.15E+03    1.15E+03    1.25E+03    1.50E+03    1.13E+03    1.17E+06    1.61E+08    2.74E+06    2.98E+05    2.82E+05    5.38E+07    4.16E+05    3.41E+05
    5.03E+03    1.61E+03    1.17E+03    1.15E+03    1.02E+03    1.12E+03    1.40E+03    1.43E+03    2.56E+06    2.16E+08    2.37E+05    2.57E+05    2.43E+05    2.65E+05    4.03E+05    4.43E+05
    5.11E+03    1.37E+03    1.24E+03    1.20E+03    1.21E+03    1.10E+03    1.28E+03    1.34E+03    3.09E+06    1.61E+08    2.84E+05    2.93E+05    2.91E+05    2.34E+05    5.40E+07    3.07E+05
    5.79E+03    2.51E+03    2.15E+03    2.21E+03    3.57E+03    1.67E+03    2.61E+03    2.28E+03    3.08E+06    4.98E+06    3.60E+06    1.63E+08    7.06E+06    1.95E+06    5.74E+07    3.44E+06
    4.49E+03    1.88E+03    1.22E+03    1.47E+03    1.23E+03    1.04E+03    1.42E+03    1.37E+03    6.11E+05    2.67E+06    2.93E+05    2.15E+08    3.31E+05    2.26E+05    4.13E+05    3.53E+05
    4.50E+03    2.22E+03    1.40E+03    1.34E+03    1.26E+03    1.22E+03    1.18E+03    1.35E+03    6.43E+05    1.10E+08    3.31E+05    1.07E+08    3.50E+05    3.29E+05    3.69E+05    4.26E+05



c.csv:
    1.30E+06    4.34E+05    4.66E+04    0.00E+00    0.00E+00    0.00E+00    0.00E+00    0.00E+00    1.62E+08    5.65E+07    6.02E+06    3.24E+05    3.55E+05    2.83E+05    3.41E+05    4.05E+05
    0.00E+00    1.74E+06    0.00E+00    0.00E+00    0.00E+00    0.00E+00    0.00E+00    0.00E+00    5.61E+05    2.17E+08    3.12E+05    3.34E+05    2.83E+05    2.83E+05    3.01E+05    3.45E+05
    0.00E+00    1.74E+06    0.00E+00    0.00E+00    0.00E+00    0.00E+00    0.00E+00    0.00E+00    6.08E+05    2.17E+08    8.92E+05    3.47E+05    3.43E+05    2.22E+05    3.64E+05    2.38E+05
    0.00E+00    1.74E+06    0.00E+00    0.00E+00    0.00E+00    0.00E+00    0.00E+00    0.00E+00    5.61E+05    2.15E+08    2.90E+06    3.35E+05    3.08E+05    5.85E+05    3.60E+05    3.81E+05
    0.00E+00    1.74E+06    0.00E+00    0.00E+00    0.00E+00    0.00E+00    0.00E+00    0.00E+00    5.45E+05    2.15E+08    2.90E+06    3.11E+05    3.06E+05    2.88E+05    3.73E+05    3.10E+05
    0.00E+00    1.30E+06    0.00E+00    0.00E+00    0.00E+00    0.00E+00    0.00E+00    9.22E+04    4.90E+06    1.65E+08    8.92E+05    3.07E+06    1.37E+06    3.40E+06    1.53E+06    1.52E+07
    0.00E+00    1.74E+06    0.00E+00    4.69E+04    0.00E+00    0.00E+00    0.00E+00    0.00E+00    3.09E+06    2.15E+08    3.08E+05    6.15E+06    3.48E+05    3.63E+05    3.85E+05    4.12E+05
    0.00E+00    0.00E+00    0.00E+00    1.31E+06    0.00E+00    0.00E+00    4.36E+05    0.00E+00    3.06E+06    1.35E+06    2.31E+05    1.61E+08    2.89E+05    2.05E+05    5.41E+07    1.77E+06
    0.00E+00    0.00E+00    0.00E+00    1.74E+06    0.00E+00    0.00E+00    0.00E+00    0.00E+00    6.69E+05    2.27E+06    3.02E+05    2.16E+08    3.27E+05    3.08E+05    3.50E+05    3.75E+05
    0.00E+00    8.69E+05    0.00E+00    8.71E+05    0.00E+00    0.00E+00    0.00E+00    0.00E+00    6.68E+05    1.10E+08    3.07E+05    1.08E+08    3.67E+05    2.34E+05    3.71E+05    3.78

Final expected output (after averaging column 8):
5.42E+05    4.25E+03    1.30E+06    125650487
4.64E+04    4.47E+03    0.00E+00    593233.3333
4.36E+04    5.12E+03    0.00E+00    634780
4.32E+04    4.32E+03    0.00E+00    570865
4.44E+04    4.64E+03    0.00E+00    766418
1.47E+05    5.03E+03    0.00E+00    3393342.667
1.81E+05    5.11E+03    0.00E+00    3113608.333
4.94E+05    5.79E+03    0.00E+00    6532673.333
4.79E+04    4.49E+03    0.00E+00    630900.3333
4.51E+04    4.50E+03    0.00E+00    636023

Then I was to do the looping for all the 16 columns (taking the following sets as[n,n+8] where n=0 to 7.

Apologize for the lengthy description, but I can't seem to append the columns in python. Thanks in advance.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Here is a sample code I started with:

import csv
import numpy as np
import sys
import pandas as pd
import glob
damn = ("a", "b", "c","e","f","g","h","i")
data = []

for fles in range(len(damn)):
    core0data = np.genfromtxt('./%s_raw_combine.csv'%damn[fles], dtype=float, delimiter=',',usecols=(0,8))
    #core1data = np.genfromtxt('./%s_raw_combine.csv'%damn[fles], dtype=float, delimiter=',',usecols=(1,9))
    #core2data = np.genfromtxt('./%s_raw_combine.csv'%damn[fles], dtype=float, delimiter=',',usecols=(2,10))
    #core3data = np.genfromtxt('./%s_raw_combine.csv'%damn[fles], dtype=float, delimiter=',',usecols=(3,11))
    #core4data = np.genfromtxt('./%s_raw_combine.csv'%damn[fles], dtype=float, delimiter=',',usecols=(4,12))
    #core5data = np.genfromtxt('./%s_raw_combine.csv'%damn[fles], dtype=float, delimiter=',',usecols=(5,13))
    #core6data = np.genfromtxt('./%s_raw_combine.csv'%damn[fles], dtype=float, delimiter=',',usecols=(6,14))
    #core7data = np.genfromtxt('./%s_raw_combine.csv'%damn[fles], dtype=float, delimiter=',',usecols=(7,15))
    data.append(core0data)

np.savetxt("writer.csv", data, delimiter= ",")

But after running this, I get error:

 > python2.7 new.py 
Traceback (most recent call last):
  File "new.py", line 20, in <module>
    np.savetxt("writer.csv", data, delimiter= ",")
  File "~/anaconda/lib/python2.7/site-packages/numpy/lib/npyio.py", line 1083, in savetxt
    fh.write(asbytes(format % tuple(row) + newline))
TypeError: float argument required, not numpy.ndarray

Upvotes: 1

Views: 918

Answers (2)

Padraic Cunningham
Padraic Cunningham

Reputation: 180411

This will join the rows and write them to a new csv:

import csv

from glob import iglob
from itertools import chain

data = []

for file in iglob("*.csv"):
    with open(file) as f:
        r = csv.reader(f)
        data.append(list(chain.from_iterable((float(row[0]), float(row[8])) for row in r)))


with open("new.csv","w") as out:
    wr = csv.writer(out)
    wr.writerows(zipped)

Passing the data to pandas:

data = []
for file in iglob("*.csv"):
    with open(file) as f:
        r = csv.reader(f)
        data.append(list(chain.from_iterable((float(row[0]), float(row[8])) for row in r)))

zipped = zip(*data)


import pandas as pd

df = pd.DataFrame(zipped)

print(df[0].mean())
print(df[1].mean())
print(df[2].mean())
print(df)

Output:

69610.0
3093.0
103830.0
            0          1          2
0     1300000       4250     542000
1   162000000  108000000  108000000
2           0       4470      46400
3      561000     601000     618000
4           0       5120      43600
5      608000     719000     578000
6           0       4320      43200
7      561000     581000     570000
8           0       4640      44400
9      545000    1170000     586000
10          0       5030     147000
11    4900000    2560000    2720000
12          0       5110     181000
13    3090000    3090000    3160000
14          0       5790     494000
15    3060000    3080000   13500000
16          0       4490      47900
17     669000     611000     612000
18          0       4500      45100
19     668000     643000     597000

get the mean of each row:

print(df.mean(1))

Output:

0     615416.666667
1      11660.000000
2      16956.666667
3       9953.333333
4      16240.000000
5      24973.333333
6      15840.000000
7       8560.000000
8      16346.666667
9       9876.666667
10     50676.666667
11     41210.000000
12     62036.666667
13      9980.000000
14    166596.666667
15     44093.333333
16     17463.333333
17     11323.333333
18     16533.333333
19     11150.000000
dtype: float64

Add that column:

df[3] = df.mean(1)

print(df)

Output:

         0     1       2              3
0   1300000  4250  542000  615416.666667
1         0  1280   33700   11660.000000
2         0  4470   46400   16956.666667
3         0  1360   28500    9953.333333
4         0  5120   43600   16240.000000
5         0  1420   73500   24973.333333
6         0  4320   43200   15840.000000
7         0  1180   24500    8560.000000
8         0  4640   44400   16346.666667
9         0  1130   28500    9876.666667
10        0  5030  147000   50676.666667
11    92200  1430   30000   41210.000000
12        0  5110  181000   62036.666667
13        0  1340   28600    9980.000000
14        0  5790  494000  166596.666667
15        0  2280  130000   44093.333333
16        0  4490   47900   17463.333333
17        0  1370   32600   11323.333333
18        0  4500   45100   16533.333333
19        0  1350   32100   11150.000000

Save to csv:

df.to_("new.csv",sep=" ")

Output:

 0 1 2 3
0 1300000.0 4250.0 542000.0 615416.6666666666
1 0.0 1280.0 33700.0 11660.0
2 0.0 4470.0 46400.0 16956.666666666668
3 0.0 1360.0 28500.0 9953.333333333334
4 0.0 5120.0 43600.0 16240.0
5 0.0 1420.0 73500.0 24973.333333333332
6 0.0 4320.0 43200.0 15840.0
7 0.0 1180.0 24500.0 8560.0
8 0.0 4640.0 44400.0 16346.666666666666
9 0.0 1130.0 28500.0 9876.666666666666
10 0.0 5030.0 147000.0 50676.666666666664
11 92200.0 1430.0 30000.0 41210.0
12 0.0 5110.0 181000.0 62036.666666666664
13 0.0 1340.0 28600.0 9980.0
14 0.0 5790.0 494000.0 166596.66666666666
15 0.0 2280.0 130000.0 44093.333333333336
16 0.0 4490.0 47900.0 17463.333333333332
17 0.0 1370.0 32600.0 11323.333333333334
18 0.0 4500.0 45100.0 16533.333333333332
19 0.0 1350.0 32100.0 11150.0

If you don't want the names and row indexes:

 df.to_csv("new.csv",sep=" ",index=False,header=False)

Output:

1300000.0 4250.0 542000.0 615416.6666666666
0.0 1280.0 33700.0 11660.0
0.0 4470.0 46400.0 16956.666666666668
0.0 1360.0 28500.0 9953.333333333334
0.0 5120.0 43600.0 16240.0
0.0 1420.0 73500.0 24973.333333333332
0.0 4320.0 43200.0 15840.0
0.0 1180.0 24500.0 8560.0
0.0 4640.0 44400.0 16346.666666666666
0.0 1130.0 28500.0 9876.666666666666
0.0 5030.0 147000.0 50676.666666666664
92200.0 1430.0 30000.0 41210.0
0.0 5110.0 181000.0 62036.666666666664
0.0 1340.0 28600.0 9980.0
0.0 5790.0 494000.0 166596.66666666666
0.0 2280.0 130000.0 44093.333333333336
0.0 4490.0 47900.0 17463.333333333332
0.0 1370.0 32600.0 11323.333333333334
0.0 4500.0 45100.0 16533.333333333332
0.0 1350.0 32100.0 11150.0

Upvotes: 2

Dickster
Dickster

Reputation: 3009

Try out pandas. Among the many benefits, will be the explicit column labeling and column slicing.

Haven't fully replicated but maybe we can work to carve a full pandas solution.

import pandas as pd

filenames = ['a.csv','b.csv','c.csv']
for i,filename in enumerate(filenames):
    df = pd.read_csv(filename,header=None)
    df.columns = df.columns + 1 #
    filenames[i] = df

dfs = pd.concat(filenames,axis=1)

print dfs.loc[:,[1,8]].head()

cols = [1,3]
print dfs[cols].mean()

Upvotes: 0

Related Questions