Reputation:
I am looking for a way to get class label from my dataframe containing rows of features.
For instance, in this example:
df = pd.DataFrame([
['1', 'a', 'bb', '0'],
['1', 'a', 'cc', '0'],
['2', 'a', 'dd', '1'],
['2', 'a', 'ee', '1'],
['3', 'a', 'ff', '2'],
['3', 'a', 'gg', '2'],
['3', 'a', 'hh', '2']], columns = ['ID', 'name', 'type', 'class'])
df
ID name type class
0 1 a bb 0
1 1 a cc 0
2 2 a dd 1
3 2 a ee 1
4 3 a ff 2
5 3 a gg 2
6 3 a hh 2
My class array should be (i.e. for each ID
the class
value should be picked once):
class
array([0., 1., 2.,])
EDIT
df['class'].values
produces array(['0', '0', '1', '1', '2', '2', '2'], dtype=object)
Expected answer:
I want array([0, 1, 2])
Upvotes: 0
Views: 4282
Reputation: 3594
You can use groupby
+ unique()
as the following:
>>> df.groupby('ID')['class'].unique().astype(int).to_numpy()
array([0, 1, 2])
For given dataframe
, you can use the following methods:
Solution 1 : Series.unique()
:
>>> df['class'].unique()
array(['0', '1', '2'], dtype=object)
#in case you want int outputs
>>> df['class'].unique().astype(int)
array([0, 1, 2])
Solution 2 value_counts()
:
>>> df['class'].value_counts(ascending=True).index.to_numpy().astype(int)
array([0, 1, 2])
Upvotes: 1
Reputation: 12397
In case multiple IDs can have same class, you can select your 'ID' and 'class' columns and drop duplicates, then fetch class column. Otherwise, simply use unique as suggested in other answer (of course you can convert this answer to ints too):
df[['ID','class']].drop_duplicates()['class'].values
#['0' '1' '2']
or similar to @wii's suggestion in comments:
df.drop_duplicates('ID')['class'].values
#['0' '1' '2']
Upvotes: 0