Reputation: 305

formating file with hours and date in the same column

our electricity provider think it could be very fun to make difficult to read csv files they provide.

This is precise electric consumption, every 30 min but in the SAME column you have hours, and date, example :

[EDIT : here the raw version of the csv file, my bad]

;
"Récapitulatif de mes puissances atteintes en W";
;
"Date et heure de relève par le distributeur";"Puissance atteinte (W)"
;
"19/11/2022";
"00:00:00";4494
"23:30:00";1174
"23:00:00";1130
[...]
"01:30:00";216
"01:00:00";2672
"00:30:00";2816
;
"18/11/2022";
"00:00:00";4494
"23:30:00";1174
"23:00:00";1130
[...]
"01:30:00";216
"01:00:00";2672
"00:30:00";2816

How damn can I obtain this kind of lovely formated file :

2022-11-19 00:00:00 2098
2022-11-19 23:30:00 218
2022-11-19 23:00:00 606

etc.

Upvotes: 0

Answers (3)

Andrej Kesely

Reputation: 195613

Try:

import pandas as pd

current_date = None
all_data = []
with open("your_file.txt", "r") as f_in:
    # skip first 5 rows (header)
    for _ in range(5):
        next(f_in)

    for row in map(str.strip, f_in):
        row = row.replace('"', "")
        if row == "":
            continue
        if "/" in row:
            current_date = row
        else:
            all_data.append([current_date, *row.split(";")])

df = pd.DataFrame(all_data, columns=["Date", "Time", "Value"])
print(df)

Prints:

           Date      Time Value
0   19/11/2022;  00:00:00  4494
1   19/11/2022;  23:30:00  1174
2   19/11/2022;  23:00:00  1130
3   19/11/2022;  01:30:00   216
4   19/11/2022;  01:00:00  2672
5   19/11/2022;  00:30:00  2816
6   18/11/2022;  00:00:00  4494
7   18/11/2022;  23:30:00  1174
8   18/11/2022;  23:00:00  1130
9   18/11/2022;  01:30:00   216
10  18/11/2022;  01:00:00  2672
11  18/11/2022;  00:30:00  2816

Upvotes: 1

Meti

Reputation: 2056

Using pandas operations would be like the following:

data.csv

19/11/2022  
00:00:00    2098
23:30:00    218
23:00:00    606
01:30:00    216
01:00:00    2672
00:30:00    2816
18/11/2022  
00:00:00    1994
23:30:00    260
23:00:00    732
01:30:00    200
01:00:00    1378
00:30:00    2520
17/11/2022  
00:00:00    1830
23:30:00    96
23:00:00    122
01:30:00    694
01:00:00    2950
00:30:00    3062
16/11/2022  
00:00:00    2420
23:30:00    678
23:00:00    644

Implementation

import pandas as pd
df = pd.read_csv('data.csv', header=None)
df['amount'] = df[0].apply(lambda item:item.split(' ')[-1] if item.find(':')>0 else None)
df['time'] = df[0].apply(lambda item:item.split(' ')[0] if item.find(':')>0 else None)
df['date'] = df[0].apply(lambda item:item if item.find('/')>0 else None)
df['date'] = df['date'].fillna(method='ffill')
df = df.dropna(subset=['amount'], how='any')
df = df.drop(0, axis=1)
print(df)

output

   amount      time          date
1    2098  00:00:00  19/11/2022  
2     218  23:30:00  19/11/2022  
3     606  23:00:00  19/11/2022  
4     216  01:30:00  19/11/2022  
5    2672  01:00:00  19/11/2022  
6    2816  00:30:00  19/11/2022  
8    1994  00:00:00  18/11/2022  
9     260  23:30:00  18/11/2022  
10    732  23:00:00  18/11/2022  
11    200  01:30:00  18/11/2022  
12   1378  01:00:00  18/11/2022  
13   2520  00:30:00  18/11/2022  
15   1830  00:00:00  17/11/2022  
16     96  23:30:00  17/11/2022  
17    122  23:00:00  17/11/2022  
18    694  01:30:00  17/11/2022  
19   2950  01:00:00  17/11/2022  
20   3062  00:30:00  17/11/2022  
22   2420  00:00:00  16/11/2022  
23    678  23:30:00  16/11/2022  
24    644  23:00:00  16/11/2022

Upvotes: 0

wallbloggerbeing

Reputation: 105

Okay I have an idiotic brutforce solution for you, so dont take that as coding recommondation but just something that gets the job done:

import itertools
dList = [f"{f}/{s}/2022" for f, s in itertools.product(range(1, 32), range(1, 13))]

i assume you have a text file with that so im just gonna use that:

file = 'yourfilename.txt'
#make sure youre running the program in the same directory as the .txt file
with open(file, "r") as f:
    global lines
    lines = f.readlines()
lines = [word.replace('\n','') for word in lines]
for i in lines:
    if i in dList:
        curD = i
    else:
        with open('output.txt', 'w') as g:
            g.write(f'{i} {(i.split())[0]} {(i.split())[1]}')

make sure to create a file called output.txt in the same directory and everything will get writen into that file.

Upvotes: 1

formating file with hours and date in the same column

Answers (3)

Related Questions