Gabriel Butoeru
Gabriel Butoeru

Reputation: 131

CSV files and Python

I'm working on a Python script that should merge some columns of some CSV files (a lot, something around 200 files). All the files look like:

Timestamp; ...; ...; ...; Value; ...
date1;...;...;...; FirstValue;...
date2;...;...;...; SecondValue;...

and so on.

From the first file I want to extract the timestamp and the column Value. From the other files I need only the column Values.

My script for now is:

#!/usr/bin/python
import csv
import os, sys

# Open a file
path = "Z:/myfolder"
dirs = os.listdir( path )
#Conto il numero di file nella cartella
print len(dirs)
#Assegno il nome del primo file
file = dirs[0]

#Apro il primo file per la lettura di timestamp e primo valore (Value)
primofile = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|')
timestamp, firstValue = [], []
#Per ogni riga del primofile
for row in primofile:
    #Copio timestamp
    timestamp.append(row[2])
    #e Value
    firstValue.append(row[15])

with open("provacript.csv", 'wb') as f:
    writer = csv.writer(f, delimiter=';')
    i = 0
    while i < len(timestamp):
        writer.writerow([timestamp[i]] + [firstValue[i]])
        i = i+1

So in "provascript.csv" I have the timestamp and the first column with my values from the first file. The next step is to open, one by one, the files in the list "dirs", read the column "Values" (the 15th column), save this column in an array and write it in "provascript.csv".

My code is:

for file in dirs:
data = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|')
column = []
for row in data:
    column.append(row[15])

In the array "column" I should have the values. I have to add this values in a new column in "provascript.csv" and move on doing the same thing with all the files. How can I do that?

I would like to have something like

TimestampFromFirstFile;ValueFromFirstFile;ValueFromSecondFile;ValueFromThirdFile;...
date1;value;value,value;...
date2;value;value;value;...
date3;value;value;value;...

So far so good. I fixed it (thanks), but instead of reading and writing Value in the first row I would like to write a part of the name. Instead of having Timestamp;Value;Value;Value I would prefer Timestamp;Temperature1;Temperature2;Presence1;Presence2.

How can I do it?

Upvotes: 1

Views: 571

Answers (2)

jrjc
jrjc

Reputation: 21883

You can do it with Pandas :

file1 = pd.read_csv("file1", index_col=0, sep=";", skipinitialspace=1)
file2 = pd.read_csv("file2", index_col=0, sep=";", skipinitialspace=1)
file3 = pd.read_csv("file3", index_col=0, sep=";", skipinitialspace=1)

here, you have plenty of options, notably to parse date while reading your csv.

file 1 being :

           ...  ....1 ....2      Value  ....3
Timestamp                                    
date1      ...   ...   ...   FirstValue   ...
date2      ...   ...   ...  SecondValue   ...

f1 = pd.DataFrame(file1.Value)
f2 = pd.DataFrame(file2.Value)
f3 = pd.DataFrame(file3.Value)

f2

          Value
Timestamp      
date1       AAA
date2       BBB

f3
           Value
Timestamp       
date1        456
date2        123

Then you define a function for recursive merge :

def recursive_merge(list_df):

    suffixe = range(1,len(list_df)+1)
    merged  = list_df[0]
    for i in range(1,len(list_df)):
        merged = merged.merge(list_df[i], left_index=True, right_index=True,
                                          suffixes=('_%s' %suffixe[i-1], '_%s' %suffixe[i]))
    if len(list_df)%2 !=0 : 
        merged.rename(
            columns = {'Value':"Value_%s" %suffixe[i]}, 
            inplace = True) # if number of recursive merge is odd 
    return merged

and call :

recursive_merge([f1,f2,f3])

Output :

               Value_1 Value_2  Value_3
Timestamp                              
date1       FirstValue     AAA      456
date2      SecondValue     BBB      123

And then you can easily write that dataframe with :

recursive_merge([f1,f2,f3]).to_csv("output.csv")

Of course if you have more than 3 files, you can make for-loops and or functions to open files and end up with a list like [f1,f2,f3,...f200]

Hope this helps

Upvotes: 2

xecgr
xecgr

Reputation: 5193

I should create the full structure and finally i will save it in the output file (assuming that files are ordered between them)

#create the full structure:  output_rows
primofile = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|')
output_rows = []
for row in primofile:
    output_rows.append([row[2], row[15]])

Once we have an ordered list of lists, complete them with the other files

for file in dirs:
    data = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|')
    column = []
    for idx,row in enumerate(data):
        output_rows[idx].append(row[15])

Finally save it to a file

with open("output.csv", 'wb') as f:
    writer = csv.writer(f, delimiter=';')
    for row in output_rows:
        writer.writerow(row)

Upvotes: 3

Related Questions