Reputation: 401
I want to pull only column A from my spreadsheet. I have the below code, but it pulls from all columns.
from openpyxl import Workbook, load_workbook
wb=load_workbook("/home/ilissa/Documents/AnacondaFiles/AZ_Palmetto_MUSC_searchterms.xlsx", use_iterators=True)
sheet_ranges=wb['PrivAlert Terms']
for row in sheet_ranges.iter_rows(row_offset=1):
for cell in row:
print(cell.value)
Upvotes: 30
Views: 119769
Reputation: 63
from openpyxl import load_workbook
book = load_workbook('excel.xlsx', read_only=True, data_only=True)
sheet = book['Sheet1']
column_C_values = [cell[0].value for cell in sheet.iter_rows(min_col=3, max_col=3)]
Upvotes: 0
Reputation: 161
Updated answer from ZLNK's response :
import openpyxl
wb=openpyxl.load_workbook('file_name.xlsm')
first_sheet = wb.sheetnames
worksheet = wb[first_sheet[2]] # index '2' is user input
for row in range(2,worksheet.max_row+1):
for column in "E": #Here you can add or reduce the columns
cell_name = "{}{}".format(column, row)
vv=worksheet[cell_name].value
Upvotes: 0
Reputation: 339
I know I might be late joining to answer this thread. But atleast my answer might benifit someone else who might be looking to solve.
You have to iterate through the column values of the sheet. According to my opinion, one could implement like this:
from openpyxl import load_workbook
wb = load_workbook("/home/ilissa/Documents/AnacondaFiles/AZ_Palmetto_MUSC_searchterms.xlsx", read_only=True)
sheet = wb['PrivAlert Terms']
for val in sheet.iter_rows(max_col=1):
print(val[0].value)
iter_rows
loops through the rows of the specified columns. You can specify the arguments of iter_rows
from min_row to max_row
and also max_col
. Setting max_col=1
here makes it loop through all the rows of column(column upto the maximum specified). This pulls all the values of only firstcolumn of your spreadsheet
Similarly if you want to iterate through all the columns of a row, that is in horizontal direction, then you can use iter_cols
specifying the from row and till column attributes
Upvotes: 0
Reputation: 187
In my opinion is much simpler
from openpyxl import Workbook, load_workbook
wb = load_workbook("your excel file")
source = wb["name of the sheet"]
for cell in source['A']:
print(cell.value)
Upvotes: 14
Reputation: 861
this is an alternative to previous answers in case you whish read one or more columns using openpyxl
import openpyxl
wb = openpyxl.load_workbook('origin.xlsx')
first_sheet = wb.get_sheet_names()[0]
worksheet = wb.get_sheet_by_name(first_sheet)
#here you iterate over the rows in the specific column
for row in range(2,worksheet.max_row+1):
for column in "ADEF": #Here you can add or reduce the columns
cell_name = "{}{}".format(column, row)
worksheet[cell_name].value # the value of the specific cell
... your tasks...
I hope that this be useful.
Upvotes: 28
Reputation: 637
By using openpyxl library and Python's list comprehensions concept:
import openpyxl
book = openpyxl.load_workbook('testfile.xlsx')
user_data = book.get_sheet_by_name(str(sheet_name))
print([str(user_data[x][0].value) for x in range(1,user_data.max_row)])
It is pretty amazing approach and worth a try
Upvotes: 1
Reputation: 166
Using openpyxl
from openpyxl import load_workbook
# The source xlsx file is named as source.xlsx
wb=load_workbook("source.xlsx")
ws = wb.active
first_column = ws['A']
# Print the contents
for x in xrange(len(first_column)):
print(first_column[x].value)
Upvotes: 15
Reputation: 716
Using ZLNK's excellent response, I created this function that uses list comprehension to achieve the same result in a single line:
def read_column(ws, begin, columns):
return [ws["{}{}".format(column, row)].value for row in range(begin, len(ws.rows) + 1) for column in columns]
You can then call it by passing a worksheet, a row to begin on and the first letter of any column you want to return:
column_a_values = read_column(worksheet, 2, 'A')
To return column A and column B, the call changes to this:
column_ab_values = read_column(worksheet, 2, 'AB')
Upvotes: 0
Reputation: 834
Here is a simple function:
import openpyxl
def return_column_from_excel(file_name, sheet_name, column_num, first_data_row=1):
wb = openpyxl.load_workbook(filename=file_name)
ws = wb.get_sheet_by_name(sheet_name)
min_col, min_row, max_col, max_row = (column_num, first_data_row, column_num, ws.max_row)
return ws.get_squared_range(min_col, min_row, max_col, max_row)
Upvotes: 2
Reputation: 19507
Use ws.get_squared_range()
to control precisely the range of cells, such as a single column, that is returned.
Upvotes: 2
Reputation: 2032
I would suggest using the pandas library.
import pandas as pd
dataFrame = pd.read_excel("/home/ilissa/Documents/AnacondaFiles/AZ_Palmetto_MUSC_searchterms.xlsx", sheetname = "PrivAlert Terms", parse_cols = 0)
If you don't feel comfortable in pandas, or for whatever reason need to work with openpyxl, the error in your code is that you aren't selecting only the first column. You explicitly call for each cell in each row. If you only want the first column, then only get the first column in each row.
for row in sheet_ranges.iter_rows(row_offset=1):
print(row[0].value)
Upvotes: 2