Reputation: 1
I'm trying to read spreadsheets in an xlsb file in python and I've used to code below to do so. I found the code in stack overflow and I'm sure that it reads every single column in a row of a spreadsheet and appends it to a dataframe. How can I modify this code so that it only reads/appends specific columns of the spreadsheet i.e. I only want to append data in columns B through D into my dataframe.
Any help would be appreciated.
import pandas as pd
from pyxlsb import open_workbook as open_xlsb
df = []
with open_xlsb('some.xlsb') as wb:
with wb.get_sheet(1) as sheet:
for row in sheet.rows():
df.append([item.v for item in row])
df = pd.DataFrame(df[1:], columns=df[0])
Upvotes: 0
Views: 3801
Reputation: 11
Just do:
import pandas as pd
from pyxlsb import open_workbook as open_xlsb
df = []
with open_xlsb('some.xlsb') as wb:
with wb.get_sheet(1) as sheet:
for row in sheet.rows():
df.append([item.v for item in row if item.c > 0 and item.c < 4])
df = pd.DataFrame(df[1:], columns=df[0])
item.c refers to the column number starting at 0
Upvotes: 1
Reputation: 304
pyxlsb itself cannot do it, but it is doable with the help of xlwings.
import pandas as pd
import xlwings as xw
from pyxlsb import open_workbook as open_xlsb
with open_xlsb(r"W:\path\filename.xlsb") as wb:
Data=xw.Range('B:D').value
#Creates a dataframe using the first list of elements as columns
Data_df = pd.DataFrame(Data[1:], columns=Data[0])
Upvotes: 1