Reputation: 7227
I have a pandas DataFrame
with the sorted, numerical index with duplicates, and the column values are identical for the same values of the index in the given column. I would like to iterate through the values of the given column for the unique values of the index.
Example
df = pd.DataFrame({'a': [3, 3, 5], 'b': [4, 6, 8]}, index=[1, 1, 2])
a b
1 3 4
1 3 6
2 5 8
I want to iterate through the values in column a
for the unique entries in the index - [3,5]
.
When I iterate using the default index
and print the type for column a
, I get the Series entries for the duplicate index entries.
for i in df.index:
cell_value = df['a'].loc[i]
print(type(cell_value))
Output:
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'numpy.int64'>
Upvotes: 1
Views: 3986
Reputation: 19957
Another solution using groupby and apply:
df.groupby(level=0).apply(lambda x: type(x.a.iloc[0]))
Out[330]:
1 <class 'numpy.int64'>
2 <class 'numpy.int64'>
dtype: object
To make your loop solution to work, create a temp df:
df_new = df.groupby(level=0).first()
for i in df_new.index:
cell_value = df_new['a'].loc[i]
print(type(cell_value))
<class 'numpy.int64'>
<class 'numpy.int64'>
Or to use drop_duplicates()
for i in df.drop_duplicates().index:
cell_value = df.drop_duplicates()['a'].loc[i]
print(type(cell_value))
<class 'numpy.int64'>
<class 'numpy.int64'>
Upvotes: 0
Reputation: 164773
This seems an XY Problem if, as per your comment, same index means same data.
You also don't need a loop for this.
Assuming you want to remove duplicate rows and extract the first column only (i.e. 3, 5), the below should suffice.
res = df.drop_duplicates().loc[:, 'a']
# 1 3
# 2 5
# Name: a, dtype: int64
To return types:
types = list(map(type, res))
print(types)
# [<class 'numpy.int64'>, <class 'numpy.int64'>]
Upvotes: 0
Reputation: 863166
First remove duplicated index by mask and assign positions by arange
, then select with iloc
:
arr = np.arange(len(df.index))
a = arr[~df.index.duplicated()]
print (a)
[0 2]
for i in a:
cell_value = df['a'].iloc[i]
print(type(cell_value))
<class 'numpy.int64'>
<class 'numpy.int64'>
No loop solution - use boolean indexing
with duplicated
and inverted mask by ~
:
a = df.loc[~df.index.duplicated(), 'a']
print (a)
1 3
2 5
Name: a, dtype: int64
b = df.loc[~df.index.duplicated(), 'a'].tolist()
print (b)
[3, 5]
print (~df.index.duplicated())
[ True False True]
Upvotes: 2
Reputation: 402814
Try np.unique
:
_, i = np.unique(df.index, return_index=True)
df.iloc[i, df.columns.get_loc('a')].tolist()
[3, 5]
Upvotes: 2