Souad
Souad

Reputation: 5084

Create CSV file from another

I have a csv file as shown below:

19/04/2015 00:00         180         187         85         162          608          61
19/04/2015 01:00          202         20         26          70         171          61
19/04/2015 02:00          20          40         40          11          40         810
19/04/2015 03:00          20          80          81         24          0          86
19/04/2015 04:00          25          30          70          91          07          50
19/04/2015 05:00          80         611          691          70          790          37
19/04/2015 06:00         199          69          706          70          790         171
19/04/2015 07:00          80          81          90         192          57         254
19/04/2015 08:00          40         152          454         259          52         151

Each row is in the same cell in the file.

I'm trying to make it look like this:

19/04/2015 00:00   180 
19/04/2015 00:10   187
19/04/2015 00:20    85
19/04/2015 00:30   162 
19/04/2015 00:40   608
19/04/2015 00:50    61
19/04/2015 01:00   202    

etc..

Explaination:

The first list of numbers is a date dd/M/YYYY HH:mm with 6 values, each value per 10 minutes. In the second presentation, I wanted to have the date of each value with the exact time with minutes.

Here is what I've tried so far:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
import sys, getopt
import tarfile
import re
import pandas as pd
import tempfile
import shutil
import collections
import urllib
import numpy
import logging
import csv


csvFile = "testfile.csv"
data = []
minutes = ['00:00','10:00','20:00','30:00','40:00','50:00']
with open(csvFile, 'rb') as csvfile:
  reader = csv.reader(csvfile, delimiter=',')
  for row in reader:
    row[0] = re.sub("\s+", ";", row[0].strip())
    rowlist = row[0].split(';')
    while(len(rowlist)<8):
        rowlist.append(0)
    for i in range(len(rowlist)):
        for m in minutes:
            data.append(rowlist[0]+rowlist[1]+m)
            data.append(rowlist[i])
    df = pd.DataFrame(data)
    df.to_csv('example.csv')

But this code didn't give me the desired result. Any suggestions?

Upvotes: 2

Views: 124

Answers (2)

void
void

Reputation: 2642

Okay I'm going to be explaining a lot in this one.

I Highly recommend you to use datetime objects if you are going to deal with dates. Because that's exactly why they are in the first place. Convert them into datetime object and you could do lots and lots of manipulations.

This a complete working code for you. I'll explain all of the concepts in depth!.

Input:

19/04/2015 00:00,         180 ,        187  ,       85    ,     162  ,        608 ,         61
19/04/2015 01:00,          202 ,        20  ,       26    ,      70  ,       171   ,       61
19/04/2015 02:00,          20  ,        40  ,       40    ,      11   ,       40  ,       810

The code:

import csv
from datetime import datetime,timedelta
list_of_list = []
with open("old_file.csv","r+") as my_csv:

    for line in my_csv:
        line = line.strip().replace(" ",'').split(',')
        list_of_list.append(line)



for item in list_of_list:
    dt = datetime.strptime(item[0],'%d/%m/%Y%H:%M')
    item[0]=dt

fin_list = []    
for item in list_of_list:
    temp_list = [item[0]+timedelta(minutes=10*(i)) for i,x in enumerate(item)]
    my_list = [list(a) for a in zip(temp_list,item[1:])]
    fin_list.extend(my_list)

for item in fin_list:
    item[0] = datetime.strftime(item[0],"%d/%m/%Y %H:%M")
    item[0].replace('"','')
print(fin_list)


with open("new_file.csv","w+") as my_csv:
    csvWriter = csv.writer(my_csv,delimiter=' ',quotechar = " ")
    csvWriter.writerows(fin_list)  

output:

 19/04/2015  00:00  180
 19/04/2015  00:10  187
 19/04/2015  00:20  85
 19/04/2015  00:30  162
 19/04/2015  00:40  608
 19/04/2015  00:50  61
 19/04/2015  01:00  202
 19/04/2015  01:10  20
 19/04/2015  01:20  26
 19/04/2015  01:30  70
 19/04/2015  01:40  171
 19/04/2015  01:50  61
 19/04/2015  02:00  20
 19/04/2015  02:10  40
 19/04/2015  02:20  40
 19/04/2015  02:30  11
 19/04/2015  02:40  40
 19/04/2015  02:50  810

1) See I'm taking each row and making them into a list. Also replacing and stripping all the whitespaces,\n,\r

line = line.strip().replace(" ",'').split(',')
            list_of_list.append(line)

output after this:

['19/04/201500:00', '180', '187', '85', '162', '608']

2) dt = datetime.strptime(item[0],'%d/%m/%Y%H:%M') what's this? the strptime from datetime takes a string and converts it into a datetime object which you can manipulate easily.

Example:

>>> datetime.strptime('19/04/201500:00','%d/%m/%Y%H:%M')
>>> datetime.datetime(2015, 4, 19, 0, 0)
>>> datetime.strptime('19/04/2015 00:00','%d/%m/%Y %H:%M') #notice how this is different from above!
>>> datetime.datetime(2015, 4, 19, 0, 0)
>>> datetime.strptime('Apr 19 2015 12:00','%b %d %Y %H:%M')
>>> datetime.datetime(2015, 4, 19, 12, 0)

Can you see how it transformed? Once you change it into a datetime object you can then easily add minutes,days,hours,months anything you want with it!.

But to add them you need a timedelta object. Consider it like this to an integer you add an integer same way to datetime add timedelta.

[item[0]+timedelta(minutes=10*(i)) for i,x in enumerate(item)]

You might be thinking hey what the hell's this?.enumerate of a iterable (list,string,tuple..etc) gives two things i,element. Where i would be 0,1,2,3,....till last index of iterable (here list) . So first i,x would be 0,item[0] next i,x would be 1,item[1] and so on.

So the list comprehension just adds 0,10,20 ,30,40,.. minutes to every datetime object.

Each item would be the below,

[datetime.datetime(2015, 4, 19, 0, 0), '180']

And finally after extend you get this:

[[datetime.datetime(2015, 4, 19, 0, 0), '180'],
 [datetime.datetime(2015, 4, 19, 0, 10), '187'],
 [datetime.datetime(2015, 4, 19, 0, 20), '85'],
 [datetime.datetime(2015, 4, 19, 0, 30), '162'],
 [datetime.datetime(2015, 4, 19, 0, 40), '608'],
 [datetime.datetime(2015, 4, 19, 0, 50), '61']]

How beautiful?

Now again convert the datetime objects to string using this,

item[0] = datetime.strftime(item[0],"%d/%m/%Y %H:%M")

So strftime converts it into desired format!. And alas write them in the new csv file using csv writer.

Note: This would print dates along with quotes by default!. Which you didn't want in your output so use quotechar = " " to remove them.

Upvotes: 2

Adeel Ahmad
Adeel Ahmad

Reputation: 1053

This should work:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

in_name = 'test.csv'
out_name = 'sample.csv'

with open(in_name, 'rb') as infile, open(out_name, 'wb') as out_file:
    for line in infile:
        parts = line.split()
        date, time, data = parts[0], parts[1], parts[2:]
        hours, _ = time.split(':')

        for minutes, value in zip(range(0, 60, 10), data):
            out_file.write('{date} {hours}:{minutes:02d} {value:>5}\n'.format(
                date=date, hours=hours, minutes=minutes, value=value
            ))

You also had a lot of unused imports which were unnecessary and could reduce performance.

Upvotes: -1

Related Questions