Adilson V Casula
Adilson V Casula

Reputation: 173

Printing values with Pandas

First of all, I am totally new on Python, so, maybe is something super simple I am not doing correctly.

I am reading a multiple worksheet xlsx file and sending each of them to separated dataframe. (at least, I think I am doing it).

xl = pd.ExcelFile("results/report.xlsx")
d = {} # your dict.
for sheet in xl.sheet_names:
    d[f'{sheet}']= pd.read_excel(xl,sheet_name=sheet)



lista_colunas = [7, 10, 101, 102, 103, 104]
d['Seg3_results'].columns[lista_colunas].values

This is the result.

>>> print(d)
{'Sheet': Empty DataFrame
Columns: []
Index: [], 'report': Empty DataFrame
Columns: []
Index: [], 'Seg10_results':    ID      Hora de início   Hora de conclusão      Email  ...  Humanas  Exatas  Linguagens Biológicas
0   1 2021-04-28 13:38:51 2021-04-28 16:25:59  anonymous  ...       38      50          38         38 
1   2 2021-04-28 17:02:11 2021-04-28 17:57:48  anonymous  ...       25       0          25         38 

[2 rows x 105 columns], 'Seg1_results':     ID      Hora de início   Hora de conclusão  ... Exatas Linguagens  Biológicas
0    1 2020-05-26 08:30:00 2020-05-26 09:15:00  ...     25         29          38
1    2 2020-05-26 08:31:12 2020-05-26 09:21:38  ...     38         33          38
2    3 2020-05-26 08:27:40 2020-05-26 09:21:38  ...     50         29          38

Then, I am trying to print just some of columns of each df (trying it manually)

lista_colunas = [7, 10, 101, 102, 103, 104]
d['Seg10_results'].columns[lista_colunas].values

But I am getting only this:

>>> d['Seg10_results'].columns[lista_colunas].values
array(['NOME COMPLETO', 'QUAL A SUA OFICINA DE APRENDIZAGEM?', 'Humanas',
       'Exatas', 'Linguagens', 'Biológicas'], dtype=object)

Any value is being shown

If I call only d['Seg10_results'][lista_colunas], I get this:

>>> d['Seg10_results'][lista_colunas]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Adilson\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\frame.py", line 3461, in __getitem__
    indexer = self.loc._get_listlike_indexer(key, axis=1)[1]
  File "C:\Users\Adilson\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\indexing.py", line 1314, in _get_listlike_indexer
    self._validate_read_indexer(keyarr, indexer, axis)
  File "C:\Users\Adilson\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\indexing.py", line 1374, in _validate_read_indexer
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Int64Index([7, 10, 101, 102, 103, 104], dtype='int64')] are in the [columns]"

What am I doing wrongly?

In time, this is part of a major work. All I am trying to do it, filter some columns of all worksheets, and save them into a new xlsx file (again, separated by worksheets, but filtered)


Adding my solution to exporting for single file with multiple sheet

I know this is far from a beautiful code, but it is working at the moment.

dados = pd.read_excel("results/report.xlsx", sheet_name=None)
df = pd.concat(dados[frame] for frame in dados.keys())

lista_colunas = [7, 10, 101, 102, 103, 104]
filtro = df.columns[lista_colunas]
final_df = df[filtro]

grouped_df = final_df.groupby(final_df.columns[1])
salas = grouped_df.groups.keys()

writer = pd.ExcelWriter('results/resultado.xlsx', engine='xlsxwriter')

for sala in salas: 
        splitdf = grouped_df.get_group(sala) 
        splitdf.to_excel(writer, sheet_name=str(sala)) 
writer.save()

Upvotes: 2

Views: 837

Answers (3)

Marvin
Marvin

Reputation: 91

d['Seg10_results'][lista_colunas] is basically d['Seg10_results][7, 10, 101, 102, 103, 104] and none of the items in lista_colunas is an actual column in d['Seg10_results'] .

You might want to either:

  • use pandas.DataFrame.iloc (docs) for example,

    d['Seg10_results'].iloc[:, lista_colunas]; or

  • store d['Seg10_results'].columns[lista_colunas].values in a variable, i.e. cols and do

    d['Seg10_results'][cols].

Upvotes: 1

Asutherland8219
Asutherland8219

Reputation: 57

Your columns are named and indexed so i think you'll need to call them by name.

Here is a code snippet for the best way to handle data (Pandas DataFrames)

1.

import pandas as pd 
   
btc = pd.read_csv('BTC_Analysis/BTC-USD.csv')
  1. dataframe = pd.DataFrame()

    print(dataframe['Date'])

For multiple entries just add the 2nd bracket:

print(dataframe[['Date', 'Open']])

Here's some quick info from Pandas docs:

https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html

Additionally you may find some useful information in this tutorial

Ps. handling XLSX files can be cumbersome, if possible it's usually better to use CSV format.

Upvotes: 0

U13-Forward
U13-Forward

Reputation: 71580

You don't need to add .columns, nor do you need values.

Instead of:

d['Seg10_results'].columns[lista_colunas].values

Try:

d['Seg10_results'][lista_colunas]

Upvotes: 0

Related Questions