Reputation: 179
I have a directory full of CSV file which contains information about the resources used by virtual machines. These files are named 'VM Resources - 2018-06-01 070001.csv' etc and contain ~400 lines.
I'm currently only interested in extracting the entry for one server (Let's call it MAILSERVER01) and the columns VMName
(index 0), Total Storage (GB)
(index 8) and the date from the file name.
I can use something like:
import os
import glob
import pandas as pd
import numpy as np
path = "my_dir_full_path"
allFiles = glob.glob(os.path.join(path,"*.csv"))
But this doesn't give me the date field that I need. I'm hoping to eventually be able to plug this information into a time series to predict when a server (all 400) is likely to require a disk extension but in the interim need to create the time series data.
I am trying to get an output CSV looking like:
VMName,Total Storage (GB),Date
MAILSERVER01,600,2018-06-01
MAILSERVER01,600,2018-06-02
MAILSERVER01,610,2018-06-03
Upvotes: 0
Views: 45
Reputation: 46779
This could be done using the built in CSV library as follows:
from operator import itemgetter
import csv
import glob
req_fields = itemgetter(0, 8) # Used to extract the two required fields from a list
header = ["VMName", "Total Storage (GB)", "Date"]
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(header)
for filename in glob.glob(r'VM Resources*.csv'):
date = filename.split()[3] # split on spaces and take the 4th part
with open(filename, newline='') as f_input:
csv_input = csv.reader(f_input)
header = next(csv_input) # skip the header
for row in csv_input:
output_row = list(req_fields(row)) + [date]
csv_output.writerow(output_row)
Giving you an output.csv
file that would look something like:
VMName,Total Storage (GB),Date
MAILSERVER01,600,2018-06-01
MAILSERVER01,600,2018-06-02
MAILSERVER01,610,2018-06-03
Tested on Python 3.6.7
Upvotes: 1