MR. JD
MR. JD

Reputation: 49

How to get Excel data row by row in a Python list

I want to get data row by row in a python list from an excel file. For example, my excel file contains multiple rows of data and the first element of my python list will be a list that will contain all the information of the first row, the second element of the list will be a list that will contain the information of the second row of the excel file and so on. Can anybody teach me the easiest way of doing that? Thank you :)

Upvotes: 3

Views: 21991

Answers (4)

Arpit Soni
Arpit Soni

Reputation: 11

This will give you list of list -

import pandas as pd

df = pd.read_excel('filename.xlsx')

l1 = []

for index, row in df.iterrows():
    l1.append(row.to_list())

Upvotes: 1

zwitsch
zwitsch

Reputation: 359

Reading a given column of a sheet in a Workbook can be done like this:

    # required packages: pandas, openpyxl

    #--- import section -------------------------------------------------
    import pandas 

    #--- create file variable: ------------------------------------------

    my_excel_file = "my_file.xlsx"

    #--- create lists: --------------------------------------------------
    df = pandas.read_excel(my_excel_file, sheet_name='my_sheet')

    # with header:
    my_list = (list)(df["my_header"].values)

    # without header:
    my_second_list = (list)(df[df.columns[1]].values)

    #--- process list: --------------------------------------------------
    print(len(my_list))
    for my_item in my_list:
            print(my_item, end = "; ")

Upvotes: 0

DS_London
DS_London

Reputation: 4251

If you are already using pandas, then this is relatively straightforward:

import pandas as pd

df = pd.read_excel('book1.xlsx',engine='openpyxl',dtype=object,header=None)

print(df.head())

l = df.values.tolist()

print(l)

NB. You may have to pip install openpyxl if it is not already in your packages.

Pandas read_excel documentation

EDIT: You don't really need the engine and dtype parameters: pandas defaults to openpyxl if you specify ".xlsx", and you can let pandas handle the types in most circumstances.

The header=None is important though, otherwise pandas will interpret the first row of your Excel sheet as the dataframe column names.

Upvotes: 3

Captain Trojan
Captain Trojan

Reputation: 2921

The easiest way would be saving that excel file into a CSV and then loading it. The proprietary excel files would be hard to decode. Use Save as... and the option CSV file.

Then, do this:

filename = ...
rows = []
with open(filename, "r") as f:
    for line in f:
        rows.append(line.split(","))
print(rows)
    

The advantage of this approach is that you need no external libraries to do this. It uses basic Python only.

Upvotes: 1

Related Questions