Gerard
Gerard

Reputation: 117

Why pandas dataframe, when using iloc, gives error: invalid literal for int() with base 10:"?

I would like to create a plot for all the indexes "p1" rows, with x="State", y=Score", however there is something wrong with iloc I receive the error code:

Traceback (most recent call last):
  File "C:/Users/u/Documents/Projects/Python/TEST6.py", line 33, in <module>
    df = f.iloc[["p1"]]
  File "C:\Users\u\Anaconda3\envs\k\lib\site-packages\pandas\core\indexing.py", line 931, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "C:\Users\u\Anaconda3\envs\k\lib\site-packages\pandas\core\indexing.py", line 1557, in _getitem_axis
    return self._get_list_axis(key, axis=axis)
  File "C:\Users\u\Anaconda3\envs\k\lib\site-packages\pandas\core\indexing.py", line 1530, in _get_list_axis
    return self.obj._take_with_is_copy(key, axis=axis)
  File "C:\Users\u\Anaconda3\envs\k\lib\site-packages\pandas\core\generic.py", line 3625, in _take_with_is_copy
    result = self.take(indices=indices, axis=axis)
  File "C:\Users\u\Anaconda3\envs\k\lib\site-packages\pandas\core\generic.py", line 3613, in take
    indices, axis=self._get_block_manager_axis(axis), verify=True
  File "C:\Users\u\Anaconda3\envs\k\lib\site-packages\pandas\core\internals\managers.py", line 851, in take
    else np.asanyarray(indexer, dtype="int64")
  File "C:\Users\u\Anaconda3\envs\k\lib\site-packages\numpy\core\_asarray.py", line 171, in asanyarray
    return array(a, dtype, copy=False, order=order, subok=True)
ValueError: invalid literal for int() with base 10: 'p1'
Process finished with exit code 1

My code:

import pandas as pd
import matplotlib.pyplot as plt

# Create a DataFrame
f = {
    'State':[1000,1002,1001,1003,1000,1003,1001],
   'Score':[62,47,55,74,31,50,60]}

f = pd.DataFrame(f,columns=['State','Score'])

# Create indexes
points_index= []
index_all = []
range_of_index=2
for i in range(len(f)):
    points_index.append(f"p"+str(int((i%range_of_index)+1)))
    index_all.append(i+1)

# Create IndexesFrame
data_indexes = pd.DataFrame({"Index_all": index_all,
                             "Points_index": points_index})

# Get DataFrames togeter
f = pd.concat([data_indexes, f], axis=1)

# Set Indexes
f = f.set_index(['Index_all',"Points_index"])

print(f)

# Create Dataframe with only p1 indexes
df = f.iloc[["p1"]]

# Create figure with plot
fig, ax1 = plt.subplots()
ax1.scatter(df['State'],df['Score'])
fig.suptitle('')
plt.show()

Printed Dataframe with new indexes looks fine as I wanted:

                       State  Score
Index_all Points_index              
1         p1             1000     62
2         p2             1002     47
3         p1             1001     55
4         p2             1003     74
5         p1             1000     31
6         p2             1003     50
7         p1             1001     60

I couldn't find the solution to what is wrong in my code and I am new to pandas.

Upvotes: 0

Views: 269

Answers (2)

Henry Ecker
Henry Ecker

Reputation: 35676

Use xs to get a cross-section of Points_index equal to p1 instead of iloc:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Create a DataFrame
f = pd.DataFrame({
    'State': [1000, 1002, 1001, 1003, 1000, 1003, 1001],
    'Score': [62, 47, 55, 74, 31, 50, 60]
}, columns=['State', 'Score'])

# Simpler DataFrame index creation:
range_of_index = 2
r = pd.Series(np.arange(len(f)))
f = f.set_index(pd.MultiIndex.from_arrays(
    [r + 1, 'p' + ((r % range_of_index) + 1).astype(str)],
    names=['Index_all', "Points_index"]
))

# Create Dataframe with only p1 indexes
# (select the cross-section of p1 values)
df = f.xs(key='p1', level='Points_index')

# Create figure with plot
fig, ax1 = plt.subplots()
ax1.scatter(df['State'], df['Score'])
fig.suptitle('')
plt.show()

f:

                        State  Score
Index_all Points_index              
1         p1             1000     62
2         p2             1002     47
3         p1             1001     55
4         p2             1003     74
5         p1             1000     31
6         p2             1003     50
7         p1             1001     60

plot 1

Upvotes: 3

Joep
Joep

Reputation: 834

Another implementation:

df = f.iloc[f.index.get_level_values('Points_index')=="p1"]

Upvotes: 0

Related Questions