Reputation: 5084
I have a csv file as shown below:
19/04/2015 00:00 180 187 85 162 608 61
19/04/2015 01:00 202 20 26 70 171 61
19/04/2015 02:00 20 40 40 11 40 810
19/04/2015 03:00 20 80 81 24 0 86
19/04/2015 04:00 25 30 70 91 07 50
19/04/2015 05:00 80 611 691 70 790 37
19/04/2015 06:00 199 69 706 70 790 171
19/04/2015 07:00 80 81 90 192 57 254
19/04/2015 08:00 40 152 454 259 52 151
Each row is in the same cell in the file.
I'm trying to make it look like this:
19/04/2015 00:00 180
19/04/2015 00:10 187
19/04/2015 00:20 85
19/04/2015 00:30 162
19/04/2015 00:40 608
19/04/2015 00:50 61
19/04/2015 01:00 202
etc..
Explaination:
The first list of numbers is a date dd/M/YYYY HH:mm
with 6 values, each value per 10 minutes.
In the second presentation, I wanted to have the date of each value with the exact time with minutes.
Here is what I've tried so far:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import sys, getopt
import tarfile
import re
import pandas as pd
import tempfile
import shutil
import collections
import urllib
import numpy
import logging
import csv
csvFile = "testfile.csv"
data = []
minutes = ['00:00','10:00','20:00','30:00','40:00','50:00']
with open(csvFile, 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
row[0] = re.sub("\s+", ";", row[0].strip())
rowlist = row[0].split(';')
while(len(rowlist)<8):
rowlist.append(0)
for i in range(len(rowlist)):
for m in minutes:
data.append(rowlist[0]+rowlist[1]+m)
data.append(rowlist[i])
df = pd.DataFrame(data)
df.to_csv('example.csv')
But this code didn't give me the desired result. Any suggestions?
Upvotes: 2
Views: 124
Reputation: 2642
Okay I'm going to be explaining a lot in this one.
I Highly recommend you to use datetime
objects if you are going to deal with dates. Because that's exactly why they are in the first place. Convert them into datetime object and you could do lots and lots of manipulations.
This a complete working code for you. I'll explain all of the concepts in depth!.
Input:
19/04/2015 00:00, 180 , 187 , 85 , 162 , 608 , 61
19/04/2015 01:00, 202 , 20 , 26 , 70 , 171 , 61
19/04/2015 02:00, 20 , 40 , 40 , 11 , 40 , 810
The code:
import csv
from datetime import datetime,timedelta
list_of_list = []
with open("old_file.csv","r+") as my_csv:
for line in my_csv:
line = line.strip().replace(" ",'').split(',')
list_of_list.append(line)
for item in list_of_list:
dt = datetime.strptime(item[0],'%d/%m/%Y%H:%M')
item[0]=dt
fin_list = []
for item in list_of_list:
temp_list = [item[0]+timedelta(minutes=10*(i)) for i,x in enumerate(item)]
my_list = [list(a) for a in zip(temp_list,item[1:])]
fin_list.extend(my_list)
for item in fin_list:
item[0] = datetime.strftime(item[0],"%d/%m/%Y %H:%M")
item[0].replace('"','')
print(fin_list)
with open("new_file.csv","w+") as my_csv:
csvWriter = csv.writer(my_csv,delimiter=' ',quotechar = " ")
csvWriter.writerows(fin_list)
output:
19/04/2015 00:00 180
19/04/2015 00:10 187
19/04/2015 00:20 85
19/04/2015 00:30 162
19/04/2015 00:40 608
19/04/2015 00:50 61
19/04/2015 01:00 202
19/04/2015 01:10 20
19/04/2015 01:20 26
19/04/2015 01:30 70
19/04/2015 01:40 171
19/04/2015 01:50 61
19/04/2015 02:00 20
19/04/2015 02:10 40
19/04/2015 02:20 40
19/04/2015 02:30 11
19/04/2015 02:40 40
19/04/2015 02:50 810
1) See I'm taking each row and making them into a list. Also replacing and stripping all the whitespaces
,\n
,\r
line = line.strip().replace(" ",'').split(',')
list_of_list.append(line)
output after this:
['19/04/201500:00', '180', '187', '85', '162', '608']
2) dt = datetime.strptime(item[0],'%d/%m/%Y%H:%M')
what's this? the strptime
from datetime takes a string and converts it into a datetime object which you can manipulate easily.
Example:
>>> datetime.strptime('19/04/201500:00','%d/%m/%Y%H:%M')
>>> datetime.datetime(2015, 4, 19, 0, 0)
>>> datetime.strptime('19/04/2015 00:00','%d/%m/%Y %H:%M') #notice how this is different from above!
>>> datetime.datetime(2015, 4, 19, 0, 0)
>>> datetime.strptime('Apr 19 2015 12:00','%b %d %Y %H:%M')
>>> datetime.datetime(2015, 4, 19, 12, 0)
Can you see how it transformed? Once you change it into a datetime
object you can then easily add minutes,days,hours,months anything you want with it!.
But to add them you need a timedelta
object. Consider it like this to an integer
you add an integer
same way to datetime
add timedelta
.
[item[0]+timedelta(minutes=10*(i)) for i,x in enumerate(item)]
You might be thinking hey what the hell's this?.enumerate
of a iterable (list,string,tuple..etc
) gives two things i,element. Where i would be 0,1,2,3,....till last index of iterable (here list)
. So first i,x
would be 0,item[0]
next i,x
would be 1,item[1]
and so on.
So the list comprehension just adds 0,10,20 ,30,40,..
minutes to every datetime object.
Each item would be the below,
[datetime.datetime(2015, 4, 19, 0, 0), '180']
And finally after extend
you get this:
[[datetime.datetime(2015, 4, 19, 0, 0), '180'],
[datetime.datetime(2015, 4, 19, 0, 10), '187'],
[datetime.datetime(2015, 4, 19, 0, 20), '85'],
[datetime.datetime(2015, 4, 19, 0, 30), '162'],
[datetime.datetime(2015, 4, 19, 0, 40), '608'],
[datetime.datetime(2015, 4, 19, 0, 50), '61']]
How beautiful?
Now again convert the datetime
objects to string using this,
item[0] = datetime.strftime(item[0],"%d/%m/%Y %H:%M")
So strftime
converts it into desired format!. And alas write them in the new csv file using csv writer
.
Note: This would print dates along with quotes by default!. Which you didn't want in your output so use quotechar = " "
to remove them.
Upvotes: 2
Reputation: 1053
This should work:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
in_name = 'test.csv'
out_name = 'sample.csv'
with open(in_name, 'rb') as infile, open(out_name, 'wb') as out_file:
for line in infile:
parts = line.split()
date, time, data = parts[0], parts[1], parts[2:]
hours, _ = time.split(':')
for minutes, value in zip(range(0, 60, 10), data):
out_file.write('{date} {hours}:{minutes:02d} {value:>5}\n'.format(
date=date, hours=hours, minutes=minutes, value=value
))
You also had a lot of unused imports which were unnecessary and could reduce performance.
Upvotes: -1