packet007
packet007

Reputation: 55

How to read an excel file in Python?

I am newbie to Python. Basically, I want to write a program to read column D & E from an excel file, and calculate the total Incoming and Outgoing duration.

Which Python module is used to read excel files and how to process data inside it?

Excel file:

D            E
Incoming    18
Outgoing    99
Incoming    20
Outgoing    59
Incoming    30
Incoming    40

Upvotes: 1

Views: 26582

Answers (3)

Michael Moura
Michael Moura

Reputation: 229

there are a couple of options depending on the version of excel you are using.
openpyxl - used for reading Excel 2010 files (ie: .xlsx)
xlrd - used for reading older Excel files (ie: .xls)

I have only used xlrd, which you could do something like the below
** Note ** code not tested

import xlrd


current_row = 0
sheet_num = 1
input_total = 0
output_total = 0

# path to the file you want to extract data from
src = r'c:\temp\excel sheet.xls'

book = xlrd.open_workbook(src)

# select the sheet where the data resides
work_sheet = book.sheet_by_index(sheet_num)

# get the total number of rows
num_rows = work_sheet.nrows - 1

while current_row < num_rows:
    row_header = work_sheet.cell_value(current_row, 4)

    if row_header == 'output':
        output_total += work_sheet.cell_value(current_row, 5)
    elif row_header == 'input':
        input_total += work_sheet.cell_value(current_row, 5)

print output_total
print input_total

Upvotes: 5

raymelfrancisco
raymelfrancisco

Reputation: 871

Using xlrd 0.9.3 in Python 3.4.1:

It puts all values from row D and E in two separate list.

It then combines each parallel elements of these lists (simply elements with same index) to a tuple using zip().

Then, these generated tuples are combined to a list. Using sum() and list comprehension, incoming_sum and outgoing_sum are calculated.

import xlrd

with xlrd.open_workbook('z.xlsx') as book:

    # 0 corresponds for 1st worksheet, usually named 'Book1'
    sheet = book.sheet_by_index(0)

    # gets col D values
    D = [ D for D in sheet.col_values(3) ]

    # gets col E values
    E = [ E for E in sheet.col_values(4) ]

    # combines D and E elements to tuples, combines tuples to list
    # ex. [ ('Incoming', 18), ('Outgoing', 99), ... ]
    data = list( zip(D, E) )

    # gets sum
    incoming_sum = sum( tup[1] for tup in data if tup[0] == 'Incoming' )
    outgoing_sum = sum( tup[1] for tup in data if tup[0] == 'Outgoing' )

    print('Total incoming:', incoming_sum)
    print('Total outgoing:', outgoing_sum)

Output:

Total incoming: 108.0
Total outgoing: 158.0

To install xlrd: (Windows)

  1. Download here: https://pypi.python.org/pypi/xlrd
  2. Extract to any directory, then change cmd's current directory ( chdir ) to the directory where you extracted, then type in cmd python setup.py install

    • Take note that you will extract xlrd-0.9.3.tar.gz two times, first to remove .gz, second to remove .tar.

    • The extracted directory (where you will change your cmd's current directory) will look like this:enter image description here

Upvotes: 1

brenns10
brenns10

Reputation: 3369

It seems like simply using Excel's =SUMIF() function would be sufficient. However, you're asking for a Python solution, so here's a Python solution!

Pandas is a library that provides a DataFrame data structure very similar to an Excel spreadsheet. It provides a read_excel() function, whose documentation you can find here. Once you have a DataFrame, you could do something like this:

import pandas as pd
table = pd.read_excel('path-to-spreadsheet.xlsx')
incoming_sum = table.E[table.D == 'Incoming'].sum()
outgoing_sum = table.E[table.D == 'Outgoing'].sum()

You can get Pandas for Python on Windows, but it's a bit difficult. The easiest way is a Scientific Python distribution for Windows, like Anaconda. On Linux, installing pandas is simple as sudo pip install pandas.

Upvotes: 2

Related Questions