Reputation: 23
I am trying to organise a column by filtering the values. In other words, there are thousands of repetitive names and I want to take just one name from each "group" and copy it in a other column.
So the column A is the current situation and the column be is the result I want to get:
Column A Column B
AB Mark Sociedad Ltda AB Mark Sociedad Ltda
AB Mark Sociedad Ltda Acosta Acosta Manuel
AB Mark Sociedad Ltda ALBAGLI, ZALIASNIK
AB Mark Sociedad Ltda
Acosta Acosta Manuel
Acosta Acosta Manuel
Acosta Acosta Manuel
ALBAGLI, ZALIASNIK
ALBAGLI, ZALIASNIK
ALBAGLI, ZALIASNIK
Finally this is the script I am trying to use:
import openpyxl
from openpyxl import load_workbook
import os
os.chdir('path')
workbook = openpyxl.load_workbook('abc.xlsx')
page_i = workbook.get_sheet_names()
sheet = workbook.get_sheet_by_name('Sheet1')
for a in range(1, 10):
representativex = sheet['A' + str(a)].value
tuple(sheet['A1':'A10'])
for row in sheet['A1':'A10']:
if representativex in row:
continue
else:
sheet['B' + str(a)].value
sheet['B' + str(a)] = representativex
workbook.save('abc.xlsx')
Unfortunately it doesn't work.
Upvotes: 1
Views: 15222
Reputation: 14236
I don't really use Python for this but here is a crude way that I found relatively quickly.
import openpyxl
wb = openpyxl.load_workbook('test.xlsx')
ws1 = wb.active
names = []
for row in ws1.columns[0]:
names.append(row.value)
names = sorted(list(set(names)))
start = 1
for name in names:
ws1.cell(row = start, column=2).value = name
start += 1
wb.save('test.xlsx')
Edit: Apparently the newer upgrade of openpyxl
needs a slight modification
Change this:
for row in ws1.columns[0]:
names.append(row.value)
To this:
for row in ws1.iter_cols(max_col = 1, min_row=1):
for cell in row:
names.append(cell.value)
And just in case your columns are different,
iter_cols(min_col=None, max_col=None, min_row=None, max_row=None)[source]
Returns all cells in the worksheet from the first row as columns.
If no boundaries are passed in the cells will start at A1.
If no cells are in the worksheet an empty tuple will be returned.
Parameters:
min_col (int) – smallest column index (1-based index)
min_row (int) – smallest row index (1-based index)
max_col (int) – largest column index (1-based index)
max_row (int) – smallest row index (1-based index)
Upvotes: 2