errno98
errno98

Reputation: 332

Column of encoded labels based on whole dataframe

I have pandas dataframe as:

df =  pd.DataFrame([[1,0,0,1], [0,1,0,0], [0,0,0,0], [1,0,0,0]], columns=list("ABCD"))
>>> df
   A  B  C  D
0  1  0  0  1
1  0  1  0  0
2  0  0  0  0
3  1  0  0  0

I want to create a single column dataframe of same height as df, with labels, as for each combination of those 1 and 0 in one row it assigns a different class (preferably numeric), i.e. this df should look like this:

>>> df_labels
    x
0   0
1   1
2   2
3   3

Looking rather for solution based on already built-in functions from libraries such as pandas or sklearn, than coded from scratch, although any help is appreciated.

I came out with such solution for now:

from sklearn.preprocessing import LabelEncoder 

labels = []
for i in range(0, len(df)):
    # create string from every row
    val = "".join([str(x) for x in df.loc[i]])
    labels.append(val)

# encode numeric labels for strings created
enc = LabelEncoder()
enc.fit(labels)
df_labels = pd.DataFrame(enc.transform(labels)) 

>>> df_labels
   0
0  3
1  1
2  0
3  2

However, is there better way to do it?

Upvotes: 0

Views: 76

Answers (3)

Andy L.
Andy L.

Reputation: 25239

If you only need a general label encodes (not as in order as your desired output) to sepate combinations of columns 'A', 'B', 'C', 'D', using dot is a simple way

n = np.arange(1, len(df.columns)+1)

Out[14]: array([1, 2, 3, 4])

df.dot(n)

Out[15]:
0    5
1    2
2    0
3    1
dtype: int64

So, each combination will be encoded as a unique value provided by dot

Upvotes: 1

BENY
BENY

Reputation: 323226

You can check with factorize

pd.factorize(df.apply(tuple,1))[0]
array([0, 1, 2, 3])

pd.Series(pd.factorize(df.apply(tuple,1))[0])
0    0
1    1
2    2
3    3
dtype: int64

Upvotes: 1

exchez
exchez

Reputation: 503

As far as I know there isn't a built-in method, but you can do something like this:

df.apply(lambda x: ('_').join(str(x.values)), axis=1).astype('category').cat.codes

Upvotes: 0

Related Questions