user12178011
user12178011

Reputation:

Issues converting xlsx to 2d array openpyxl

I have an excel spreadsheet with 2 columns that I am attempting to convert to a 2d array with the following code:

#!/usr/bin/python3

import openpyxl
import sys
import os

book = openpyxl.load_workbook('contact2019.xlsx')

sheet = book.active


first_sheet = book.get_sheet_names()[0]
worksheet = book.get_sheet_by_name(first_sheet)
excel_data = [[0 for x in range(2)] for y in range(1)]

print(len(excel_data))
first = 0
cell_num = 0
for row in range(2,worksheet.max_row+1):
    for column in "AB":  #Here you can add or reduce the columns
        cell_name = "{}{}".format(column, row)
        excel_data.append(worksheet[cell_name].value)



print(excel_data)

My issue is that the data is simply being added to a 1d array sequentially, so my output of this is as follows:

['Sam Adams', '*******@gmail.com']

This is not a 2d array which I have initialized.

Input data is as follows:

Sam Adams   **********@gmail.com
Sammy Adams **********@gmail.com
Samuel Adams    **********@gmail.com
Samantha Adams  **********@gmail.com
Sam Adams   **********@gmail.com

Why is this assigning data sequentially instead of two per row?

Upvotes: 1

Views: 1966

Answers (2)

Trenton McKinney
Trenton McKinney

Reputation: 62483

Alternatively: Use pandas:

import pandas as pd
import numpy as np

df = pd.read_excel('test.xlsx')  # change the name of the file as needed

# dataframe
           name                 email
      Sam Adams  **********@gmail.com
    Sammy Adams  **********@gmail.com
   Samuel Adams  **********@gmail.com
 Samantha Adams  **********@gmail.com
      Sam Adams  **********@gmail.com

Create array:

excel_data = df.to_numpy()
print(excel_data)

# Output
array([['Sam Adams', '**********@gmail.com'],
       ['Sammy Adams', '**********@gmail.com'],
       ['Samuel Adams', '**********@gmail.com'],
       ['Samantha Adams', '**********@gmail.com'],
       ['Sam Adams', '**********@gmail.com']], dtype=object)

Upvotes: 1

zglin
zglin

Reputation: 2919

Welcome to SO!

Your code below iterates through and adds each item as a separate item, hence you get everything in sequential order.

for row in range(2,worksheet.max_row+1):
    for column in "AB":  #Here you can add or reduce the columns
        cell_name = "{}{}".format(column, row)
        excel_data.append(worksheet[cell_name].value)

Rather than looping through first the rows and then all the columns, you could replace the code to only loop through the rows.

for row in range(2,worksheet.max_row+1):        
    excel_data.append([worksheet["A{}".format(row)].value,worksheet["B{}".format(row)].value])

By doing this you are now creating a list of 2 element lists (or a 2d array).

If you aren't set on doing it this way, you can also consider the python library pandas which abstracts out a lot of the work and lets you work with spreadsheet like objects called DataFrames.

Upvotes: 0

Related Questions