Reputation: 131
I'm working on a Python script that should merge some columns of some CSV files (a lot, something around 200 files). All the files look like:
Timestamp; ...; ...; ...; Value; ...
date1;...;...;...; FirstValue;...
date2;...;...;...; SecondValue;...
and so on.
From the first file I want to extract the timestamp and the column Value. From the other files I need only the column Values.
My script for now is:
#!/usr/bin/python
import csv
import os, sys
# Open a file
path = "Z:/myfolder"
dirs = os.listdir( path )
#Conto il numero di file nella cartella
print len(dirs)
#Assegno il nome del primo file
file = dirs[0]
#Apro il primo file per la lettura di timestamp e primo valore (Value)
primofile = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|')
timestamp, firstValue = [], []
#Per ogni riga del primofile
for row in primofile:
#Copio timestamp
timestamp.append(row[2])
#e Value
firstValue.append(row[15])
with open("provacript.csv", 'wb') as f:
writer = csv.writer(f, delimiter=';')
i = 0
while i < len(timestamp):
writer.writerow([timestamp[i]] + [firstValue[i]])
i = i+1
So in "provascript.csv" I have the timestamp and the first column with my values from the first file. The next step is to open, one by one, the files in the list "dirs", read the column "Values" (the 15th column), save this column in an array and write it in "provascript.csv".
My code is:
for file in dirs:
data = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|')
column = []
for row in data:
column.append(row[15])
In the array "column" I should have the values. I have to add this values in a new column in "provascript.csv" and move on doing the same thing with all the files. How can I do that?
I would like to have something like
TimestampFromFirstFile;ValueFromFirstFile;ValueFromSecondFile;ValueFromThirdFile;...
date1;value;value,value;...
date2;value;value;value;...
date3;value;value;value;...
So far so good. I fixed it (thanks), but instead of reading and writing Value in the first row I would like to write a part of the name. Instead of having Timestamp;Value;Value;Value I would prefer Timestamp;Temperature1;Temperature2;Presence1;Presence2.
How can I do it?
Upvotes: 1
Views: 571
Reputation: 21883
You can do it with Pandas :
file1 = pd.read_csv("file1", index_col=0, sep=";", skipinitialspace=1)
file2 = pd.read_csv("file2", index_col=0, sep=";", skipinitialspace=1)
file3 = pd.read_csv("file3", index_col=0, sep=";", skipinitialspace=1)
here, you have plenty of options, notably to parse date while reading your csv.
file 1 being :
... ....1 ....2 Value ....3
Timestamp
date1 ... ... ... FirstValue ...
date2 ... ... ... SecondValue ...
f1 = pd.DataFrame(file1.Value)
f2 = pd.DataFrame(file2.Value)
f3 = pd.DataFrame(file3.Value)
f2
Value
Timestamp
date1 AAA
date2 BBB
f3
Value
Timestamp
date1 456
date2 123
Then you define a function for recursive merge :
def recursive_merge(list_df):
suffixe = range(1,len(list_df)+1)
merged = list_df[0]
for i in range(1,len(list_df)):
merged = merged.merge(list_df[i], left_index=True, right_index=True,
suffixes=('_%s' %suffixe[i-1], '_%s' %suffixe[i]))
if len(list_df)%2 !=0 :
merged.rename(
columns = {'Value':"Value_%s" %suffixe[i]},
inplace = True) # if number of recursive merge is odd
return merged
and call :
recursive_merge([f1,f2,f3])
Output :
Value_1 Value_2 Value_3
Timestamp
date1 FirstValue AAA 456
date2 SecondValue BBB 123
And then you can easily write that dataframe with :
recursive_merge([f1,f2,f3]).to_csv("output.csv")
Of course if you have more than 3 files, you can make for-loops and or functions to open files and end up with a list like [f1,f2,f3,...f200]
Hope this helps
Upvotes: 2
Reputation: 5193
I should create the full structure and finally i will save it in the output file (assuming that files are ordered between them)
#create the full structure: output_rows
primofile = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|')
output_rows = []
for row in primofile:
output_rows.append([row[2], row[15]])
Once we have an ordered list of lists, complete them with the other files
for file in dirs:
data = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|')
column = []
for idx,row in enumerate(data):
output_rows[idx].append(row[15])
Finally save it to a file
with open("output.csv", 'wb') as f:
writer = csv.writer(f, delimiter=';')
for row in output_rows:
writer.writerow(row)
Upvotes: 3