Hans Schmidt
Hans Schmidt

Reputation: 23

Filter a column in excel Python

I am trying to organise a column by filtering the values. In other words, there are thousands of repetitive names and I want to take just one name from each "group" and copy it in a other column.

So the column A is the current situation and the column be is the result I want to get:

Column A                   Column B

AB Mark Sociedad Ltda      AB Mark Sociedad Ltda
AB Mark Sociedad Ltda      Acosta Acosta Manuel
AB Mark Sociedad Ltda      ALBAGLI, ZALIASNIK 
AB Mark Sociedad Ltda
Acosta Acosta Manuel 
Acosta Acosta Manuel 
Acosta Acosta Manuel
ALBAGLI, ZALIASNIK 
ALBAGLI, ZALIASNIK
ALBAGLI, ZALIASNIK

Finally this is the script I am trying to use:

import openpyxl
from openpyxl import load_workbook
import os

os.chdir('path')
workbook = openpyxl.load_workbook('abc.xlsx')
page_i = workbook.get_sheet_names()
sheet = workbook.get_sheet_by_name('Sheet1')

for a in range(1, 10):
    representativex = sheet['A' + str(a)].value
    tuple(sheet['A1':'A10'])

    for row in sheet['A1':'A10']:
        if representativex in row:
            continue
        else:
            sheet['B' + str(a)].value 
            sheet['B' + str(a)] = representativex

        workbook.save('abc.xlsx')

Unfortunately it doesn't work.

Upvotes: 1

Views: 15222

Answers (1)

gold_cy
gold_cy

Reputation: 14236

I don't really use Python for this but here is a crude way that I found relatively quickly.

import openpyxl

wb = openpyxl.load_workbook('test.xlsx')
ws1 = wb.active

Sample Data

names = []
for row in ws1.columns[0]:
    names.append(row.value)

names = sorted(list(set(names)))

start = 1
for name in names:
    ws1.cell(row = start, column=2).value = name
    start += 1

wb.save('test.xlsx')

Sample Output Data

Edit: Apparently the newer upgrade of openpyxl needs a slight modification

Change this:

for row in ws1.columns[0]:
        names.append(row.value)

To this:

for row in ws1.iter_cols(max_col = 1, min_row=1):
    for cell in row:
        names.append(cell.value)

And just in case your columns are different,

iter_cols(min_col=None, max_col=None, min_row=None, max_row=None)[source]

Returns all cells in the worksheet from the first row as columns.

If no boundaries are passed in the cells will start at A1.

If no cells are in the worksheet an empty tuple will be returned.
Parameters: 

    min_col (int) – smallest column index (1-based index)
    min_row (int) – smallest row index (1-based index)
    max_col (int) – largest column index (1-based index)
    max_row (int) – smallest row index (1-based index)

Upvotes: 2

Related Questions