Reputation: 746
I have many excel sheets in which data is arranged in table as shown below
I want to convert each of this table to another list-like format as shown below.
In this table:
Dir -> Name of the tab
Year -> Same for entire table
DOM -> Day of the month
DOW -> Day of the week.
Hour -> Column label in original table
Traffic Count -> Value in the original table
There are close to 1000 such sheets. The data in each sheet is at the same location. What is the best way to do this? Should I write a VBA script or is there any thing in Excel I can use to make my life easier?
Upvotes: 0
Views: 1177
Reputation: 746
I solved the problem using python and the xlrd module
import xlrd
import numpy as np
from os import listdir, chdir
import calendar
import re
direction = {'N':0,'S':1,'E':2,'W':3}
rend = [0, 40, 37, 40, 39, 40, 39, 40, 40, 39, 40, 39, 40]
# Initialize the matrix to store final result
aData = np.matrix(np.zeros((0,7)))
# get all the xlsx files from the directory.
filenames = [ f for f in listdir('.') if f.endswith('.xlsx') ]
# for each .xlsx in the current directory
for file in filenames:
# The file names are in the format gdot_39446_yyyy_mm.xlsx
# yyyy is the year and mm is the month number with 0 -Jan and 11 - Dec
# I extracted the month and year info from the file name
ms = re.search('.+_.+_.+_([0-9]+)\.',file,re.M).group(1)
month = int(ms) + 1
year = int(file[11:15])
# open the workbook
workbook = xlrd.open_workbook(file)
# the workbook has three sheets. I want information from
# sheet2 and sheet3 (indexed by 1 adn 2 resp.)
for i in range(1,3):
sheet = workbook.sheet_by_index(i)
di = sheet.name[-1]
data = [[sheet.cell_value(r,c) for c in range(2,26)] for r in range(9,rend[month])]
mData = np.matrix(data)
mData[np.where(mData=='')] = 0 # some cells are blank. Insert 0 in those cells
n,d = mData.shape
rows = n * d
rData = np.matrix(np.zeros((rows,7)))
rData[:,0].fill(direction[di])
rData[:,1].fill(year)
rData[:,2].fill(month)
for i in range(rows):
rData[i,3] = (i/24) + 1
rData[i,4] = calendar.weekday(year,month,(i/24) + 1)
rData[i,5] = i%24
for i in range(n):
rData[i*24:((i+1)*24),6] = mData[i,:].T
aData = np.vstack((aData,rData))
np.savetxt("alldata.csv",aData, delimiter=',', fmt='%s')
Upvotes: 2