Reputation: 55
I am newbie to Python.
Basically, I want to write a program to read column D
& E
from an excel file, and calculate the total Incoming
and Outgoing
duration.
Which Python module is used to read excel files and how to process data inside it?
Excel file:
D E
Incoming 18
Outgoing 99
Incoming 20
Outgoing 59
Incoming 30
Incoming 40
Upvotes: 1
Views: 26582
Reputation: 229
there are a couple of options depending on the version of excel you are using.
openpyxl - used for reading Excel 2010 files (ie: .xlsx)
xlrd - used for reading older Excel files (ie: .xls)
I have only used xlrd, which you could do something like the below
** Note ** code not tested
import xlrd
current_row = 0
sheet_num = 1
input_total = 0
output_total = 0
# path to the file you want to extract data from
src = r'c:\temp\excel sheet.xls'
book = xlrd.open_workbook(src)
# select the sheet where the data resides
work_sheet = book.sheet_by_index(sheet_num)
# get the total number of rows
num_rows = work_sheet.nrows - 1
while current_row < num_rows:
row_header = work_sheet.cell_value(current_row, 4)
if row_header == 'output':
output_total += work_sheet.cell_value(current_row, 5)
elif row_header == 'input':
input_total += work_sheet.cell_value(current_row, 5)
print output_total
print input_total
Upvotes: 5
Reputation: 871
Using xlrd 0.9.3 in Python 3.4.1:
It puts all values from row D
and E
in two separate list
.
It then combines each parallel elements of these lists (simply elements with same index) to a tuple using zip()
.
Then, these generated tuples are combined to a list
. Using sum()
and list comprehension, incoming_sum
and outgoing_sum
are calculated.
import xlrd
with xlrd.open_workbook('z.xlsx') as book:
# 0 corresponds for 1st worksheet, usually named 'Book1'
sheet = book.sheet_by_index(0)
# gets col D values
D = [ D for D in sheet.col_values(3) ]
# gets col E values
E = [ E for E in sheet.col_values(4) ]
# combines D and E elements to tuples, combines tuples to list
# ex. [ ('Incoming', 18), ('Outgoing', 99), ... ]
data = list( zip(D, E) )
# gets sum
incoming_sum = sum( tup[1] for tup in data if tup[0] == 'Incoming' )
outgoing_sum = sum( tup[1] for tup in data if tup[0] == 'Outgoing' )
print('Total incoming:', incoming_sum)
print('Total outgoing:', outgoing_sum)
Output:
Total incoming: 108.0
Total outgoing: 158.0
To install xlrd: (Windows)
Extract to any directory, then change cmd
's current directory ( chdir
) to the directory where you extracted, then type in cmd python setup.py install
Take note that you will extract xlrd-0.9.3.tar.gz
two times, first to remove .gz
, second to remove .tar
.
The extracted directory (where you will change your cmd's current directory) will look like this:
Upvotes: 1
Reputation: 3369
It seems like simply using Excel's =SUMIF()
function would be sufficient. However, you're asking for a Python solution, so here's a Python solution!
Pandas is a library that provides a DataFrame data structure very similar to an Excel spreadsheet. It provides a read_excel()
function, whose documentation you can find here. Once you have a DataFrame, you could do something like this:
import pandas as pd
table = pd.read_excel('path-to-spreadsheet.xlsx')
incoming_sum = table.E[table.D == 'Incoming'].sum()
outgoing_sum = table.E[table.D == 'Outgoing'].sum()
You can get Pandas for Python on Windows, but it's a bit difficult. The easiest way is a Scientific Python distribution for Windows, like Anaconda. On Linux, installing pandas
is simple as sudo pip install pandas
.
Upvotes: 2