create 1D array from data frame column

Question

I am looking for a way to get class label from my dataframe containing rows of features.

For instance, in this example:

df = pd.DataFrame([
['1',   'a',    'bb',   '0'],
['1',   'a',    'cc',   '0'],
['2', 'a',    'dd',   '1'],
['2',   'a',    'ee',   '1'],
['3', 'a',    'ff',   '2'],
['3', 'a',    'gg',   '2'],
['3', 'a',    'hh',   '2']], columns = ['ID', 'name', 'type', 'class'])

df 
    ID  name    type class
0   1    a      bb      0
1   1    a      cc      0
2   2    a      dd      1
3   2    a      ee      1
4   3    a      ff      2
5   3    a      gg      2
6   3    a      hh      2

My class array should be (i.e. for each ID the class value should be picked once):

class
array([0., 1., 2.,])

EDIT

df['class'].values produces array(['0', '0', '1', '1', '2', '2', '2'], dtype=object)

Expected answer:

I want array([0, 1, 2])

Ehsan · Accepted Answer

In case multiple IDs can have same class, you can select your 'ID' and 'class' columns and drop duplicates, then fetch class column. Otherwise, simply use unique as suggested in other answer (of course you can convert this answer to ints too):

df[['ID','class']].drop_duplicates()['class'].values
#['0' '1' '2']

or similar to @wii's suggestion in comments:

df.drop_duplicates('ID')['class'].values
#['0' '1' '2']

create 1D array from data frame column

Answers (2)

Related Questions