Sanket
Sanket

Reputation: 746

Converting table to list in Excel

I have many excel sheets in which data is arranged in table as shown below

Original Table I want to convert each of this table to another list-like format as shown below.

Required format

In this table:

Dir -> Name of  the tab
Year -> Same for entire table
DOM -> Day of the month
DOW -> Day of the week.
Hour -> Column label in original table
Traffic Count -> Value in the original table

There are close to 1000 such sheets. The data in each sheet is at the same location. What is the best way to do this? Should I write a VBA script or is there any thing in Excel I can use to make my life easier?

Upvotes: 0

Views: 1177

Answers (1)

Sanket
Sanket

Reputation: 746

I solved the problem using python and the xlrd module

import xlrd
import numpy as np
from os import listdir, chdir
import calendar
import re

direction = {'N':0,'S':1,'E':2,'W':3}
rend = [0, 40, 37, 40, 39, 40, 39, 40, 40, 39, 40, 39, 40]

# Initialize the matrix to store final result
aData = np.matrix(np.zeros((0,7)))

# get all the xlsx files from the directory.
filenames = [ f for f in listdir('.') if f.endswith('.xlsx') ]

# for each .xlsx in the current directory
for file in filenames:
    # The file names are in the format gdot_39446_yyyy_mm.xlsx
    # yyyy is the year and mm is the month number with 0 -Jan and 11 - Dec
    # I extracted the month and year info from the file name

    ms = re.search('.+_.+_.+_([0-9]+)\.',file,re.M).group(1)
    month = int(ms) + 1
    year = int(file[11:15])

    # open the workbook
    workbook = xlrd.open_workbook(file)

    # the workbook has three sheets. I want information from 
    # sheet2 and sheet3 (indexed by 1 adn 2 resp.)
    for i in range(1,3):
        sheet = workbook.sheet_by_index(i)
        di =  sheet.name[-1]
        data = [[sheet.cell_value(r,c) for c in range(2,26)] for r in range(9,rend[month])]

        mData = np.matrix(data)
        mData[np.where(mData=='')] = 0 # some cells are blank. Insert 0 in those cells
        n,d = mData.shape
        rows = n * d

        rData = np.matrix(np.zeros((rows,7)))
        rData[:,0].fill(direction[di])
        rData[:,1].fill(year)
        rData[:,2].fill(month)
        for i in range(rows):
            rData[i,3] = (i/24) + 1
            rData[i,4] = calendar.weekday(year,month,(i/24) + 1)
            rData[i,5] = i%24

        for i in range(n):
            rData[i*24:((i+1)*24),6] = mData[i,:].T

        aData = np.vstack((aData,rData))

    np.savetxt("alldata.csv",aData, delimiter=',', fmt='%s')    

Upvotes: 2

Related Questions