Reputation: 173
First of all, I am totally new on Python, so, maybe is something super simple I am not doing correctly.
I am reading a multiple worksheet xlsx file and sending each of them to separated dataframe. (at least, I think I am doing it).
xl = pd.ExcelFile("results/report.xlsx")
d = {} # your dict.
for sheet in xl.sheet_names:
d[f'{sheet}']= pd.read_excel(xl,sheet_name=sheet)
lista_colunas = [7, 10, 101, 102, 103, 104]
d['Seg3_results'].columns[lista_colunas].values
This is the result.
>>> print(d)
{'Sheet': Empty DataFrame
Columns: []
Index: [], 'report': Empty DataFrame
Columns: []
Index: [], 'Seg10_results': ID Hora de início Hora de conclusão Email ... Humanas Exatas Linguagens Biológicas
0 1 2021-04-28 13:38:51 2021-04-28 16:25:59 anonymous ... 38 50 38 38
1 2 2021-04-28 17:02:11 2021-04-28 17:57:48 anonymous ... 25 0 25 38
[2 rows x 105 columns], 'Seg1_results': ID Hora de início Hora de conclusão ... Exatas Linguagens Biológicas
0 1 2020-05-26 08:30:00 2020-05-26 09:15:00 ... 25 29 38
1 2 2020-05-26 08:31:12 2020-05-26 09:21:38 ... 38 33 38
2 3 2020-05-26 08:27:40 2020-05-26 09:21:38 ... 50 29 38
Then, I am trying to print just some of columns of each df (trying it manually)
lista_colunas = [7, 10, 101, 102, 103, 104]
d['Seg10_results'].columns[lista_colunas].values
But I am getting only this:
>>> d['Seg10_results'].columns[lista_colunas].values
array(['NOME COMPLETO', 'QUAL A SUA OFICINA DE APRENDIZAGEM?', 'Humanas',
'Exatas', 'Linguagens', 'Biológicas'], dtype=object)
Any value is being shown
If I call only d['Seg10_results'][lista_colunas]
, I get this:
>>> d['Seg10_results'][lista_colunas]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Adilson\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\frame.py", line 3461, in __getitem__
indexer = self.loc._get_listlike_indexer(key, axis=1)[1]
File "C:\Users\Adilson\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\indexing.py", line 1314, in _get_listlike_indexer
self._validate_read_indexer(keyarr, indexer, axis)
File "C:\Users\Adilson\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\indexing.py", line 1374, in _validate_read_indexer
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Int64Index([7, 10, 101, 102, 103, 104], dtype='int64')] are in the [columns]"
What am I doing wrongly?
In time, this is part of a major work. All I am trying to do it, filter some columns of all worksheets, and save them into a new xlsx file (again, separated by worksheets, but filtered)
I know this is far from a beautiful code, but it is working at the moment.
dados = pd.read_excel("results/report.xlsx", sheet_name=None)
df = pd.concat(dados[frame] for frame in dados.keys())
lista_colunas = [7, 10, 101, 102, 103, 104]
filtro = df.columns[lista_colunas]
final_df = df[filtro]
grouped_df = final_df.groupby(final_df.columns[1])
salas = grouped_df.groups.keys()
writer = pd.ExcelWriter('results/resultado.xlsx', engine='xlsxwriter')
for sala in salas:
splitdf = grouped_df.get_group(sala)
splitdf.to_excel(writer, sheet_name=str(sala))
writer.save()
Upvotes: 2
Views: 837
Reputation: 91
d['Seg10_results'][lista_colunas]
is basically d['Seg10_results][7, 10, 101, 102, 103, 104]
and none of the items in lista_colunas
is an actual column in d['Seg10_results']
.
You might want to either:
use pandas.DataFrame.iloc
(docs) for example,
d['Seg10_results'].iloc[:, lista_colunas]
; or
store d['Seg10_results'].columns[lista_colunas].values
in a variable, i.e. cols
and do
d['Seg10_results'][cols]
.
Upvotes: 1
Reputation: 57
Your columns are named and indexed so i think you'll need to call them by name.
Here is a code snippet for the best way to handle data (Pandas DataFrames)
1.
import pandas as pd
btc = pd.read_csv('BTC_Analysis/BTC-USD.csv')
dataframe = pd.DataFrame()
print(dataframe['Date'])
For multiple entries just add the 2nd bracket:
print(dataframe[['Date', 'Open']])
Here's some quick info from Pandas docs:
https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html
Additionally you may find some useful information in this tutorial
Ps. handling XLSX files can be cumbersome, if possible it's usually better to use CSV format.
Upvotes: 0
Reputation: 71580
You don't need to add .columns
, nor do you need values
.
Instead of:
d['Seg10_results'].columns[lista_colunas].values
Try:
d['Seg10_results'][lista_colunas]
Upvotes: 0