user13641081
user13641081

Reputation:

create 1D array from data frame column

I am looking for a way to get class label from my dataframe containing rows of features.

For instance, in this example:

df = pd.DataFrame([
['1',   'a',    'bb',   '0'],
['1',   'a',    'cc',   '0'],
['2', 'a',    'dd',   '1'],
['2',   'a',    'ee',   '1'],
['3', 'a',    'ff',   '2'],
['3', 'a',    'gg',   '2'],
['3', 'a',    'hh',   '2']], columns = ['ID', 'name', 'type', 'class'])

df 
    ID  name    type class
0   1    a      bb      0
1   1    a      cc      0
2   2    a      dd      1
3   2    a      ee      1
4   3    a      ff      2
5   3    a      gg      2
6   3    a      hh      2

My class array should be (i.e. for each ID the class value should be picked once):

class
array([0., 1., 2.,])

EDIT

df['class'].values produces array(['0', '0', '1', '1', '2', '2', '2'], dtype=object)

Expected answer:

I want array([0, 1, 2])

Upvotes: 0

Views: 4282

Answers (2)

Grayrigel
Grayrigel

Reputation: 3594

You can use groupby+ unique() as the following:

>>> df.groupby('ID')['class'].unique().astype(int).to_numpy()
array([0, 1, 2])

For given dataframe, you can use the following methods:

Solution 1 : Series.unique():

>>> df['class'].unique()
array(['0', '1', '2'], dtype=object)

#in case you want int outputs
>>> df['class'].unique().astype(int)
array([0, 1, 2])

Solution 2 value_counts():

>>> df['class'].value_counts(ascending=True).index.to_numpy().astype(int)
array([0, 1, 2])

Upvotes: 1

Ehsan
Ehsan

Reputation: 12397

In case multiple IDs can have same class, you can select your 'ID' and 'class' columns and drop duplicates, then fetch class column. Otherwise, simply use unique as suggested in other answer (of course you can convert this answer to ints too):

df[['ID','class']].drop_duplicates()['class'].values
#['0' '1' '2']

or similar to @wii's suggestion in comments:

df.drop_duplicates('ID')['class'].values
#['0' '1' '2']

Upvotes: 0

Related Questions