Henry Maathuis
Henry Maathuis

Reputation: 123

Elegant way to encode a list of lists

Currently I am trying to one hot encode a list of lists that contain single elements. What is a clean Pythonic way to go from representation 2 to representation 1? Additionally I would like to know a clean approach to go from representation 1 to representation 2.

Representation 1

[[1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 ...
 [0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 1. 0. 0. 0.]]
(256, 6)

Representation 2

[[0.]
 [3.]
 [3.]
 ...
 [2.]
 [3.]
 [2.]]
(256, 1)

Upvotes: 2

Views: 505

Answers (5)

yacola
yacola

Reputation: 3023

Using pure basic conditionnal list comprehension, for representation 1 to 2:

r1 = [[1., 0., 0., 0., 0., 0.],
      [0., 0., 0., 1., 0., 0.],
      [0., 0., 0., 0., 1., 0.]]
len_r1l = len(r1[0]) # length of each sublist, here 6

r2 = [[0], [3], [4]]

r1_r2 = [[i] for l in r1 for i in range(len_r1l) if l[i]==1]
>>> [[0], [3], [4]]

and for representation 2 to 1:

r2_r1 = [[1. if i==idx[0] else 0 for i in range(len_r1l)] for idx in r2]
>>> [[1.0, 0, 0, 0, 0, 0],
     [0, 0, 0, 1.0, 0, 0],
     [0, 0, 0, 0, 1.0, 0]]

Equivalently by using numpy, with np.nonzero:

# convert to array
r1_np = np.asarray(r1)
r2_np = np.asarray(r2)

r1_r2 = np.nonzero(r1_np)[1]
>>> array([0, 3, 4])

r2_r1 = np.zeros_like(r1_np)
r2_r1[np.arange(r1_r2.shape[0]),r1_r2] = 1.
>>> array([[1., 0., 0., 0., 0., 0.],
           [0., 0., 0., 1., 0., 0.],
           [0., 0., 0., 0., 1., 0.]])

then if you really want to keep it to list use np.ndarray.tolist method:

r1_r2.tolist()
>>> [0, 3, 4]
r2_r1.tolist()
>>> [[1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
     [0.0, 0.0, 0.0, 1.0, 0.0, 0.0],
     [0.0, 0.0, 0.0, 0.0, 1.0, 0.0]]

Benchmarking these answers for intended input of size 256 clearly shows numpy's efficiency:

# representation 1 to 2
%timeit [[i] for l in r1 for i in range(len_r1l) if l[i]==1]
>>> 199 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.nonzero(r1_np)[1]
>>> 13 µs ± 32.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

# representation 2 to 1
%timeit [[1. if i==idx[0] else 0 for i in range(len_r1l)] for idx in r2]
>>> 243 µs ± 820 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit r2_r1 = np.zeros_like(r1_np); r2_r1[np.arange(r1_r2.shape[0]),r1_r2] = 1.
>>> 9.42 µs ± 15.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Hope this helps.

Upvotes: 3

amzon-ex
amzon-ex

Reputation: 1744

Using numpy,

rep_2 = np.where(condition)[1].reshape(rep1.shape[0], 1)

where condition could be stated in many ways, of which are:

  • rep_1 == 1
  • rep_1 != 0

Depending upon your requirement. Convert rep_2 to a list if you so wish.

Upvotes: 0

Josh Clark
Josh Clark

Reputation: 1012

Representation 1 --> 2:

If you know that every list will have one and only one 1, you can use list.index in a list comprehension:

list_of_lists = [  # Your initial list
    [1, 0, 0],
    [0, 1, 0],
    [0, 0, 1]
]

list_of_ones_indices = [[lst.index(1)] for lst in list_of_lists]
# [0, 1, 2]

Representation 2 --> 1:

This numpy solution might be closer to what you're looking for. If you want a pure-Python solution, here you go:

index_list = [1, 2, 3]
LENGTH = 6
one_hot_list = []

# This can also be achieved with a list comprehension and range()
for index in index_list:
    one_hot = [0] * LENGTH
    one_hot[index[0]] = 1
    one_hot_list.append(one_hot)

print(one_hot_list)
# [
#     [0, 1, 0, 0, 0, 0],
#     [0, 0, 1, 0, 0, 0],
#     [0, 0, 0, 1, 0, 0]
# ]

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153500

IIUC,

np.argmax(a, axis=1)[:, None]

Using @Yacola setup:

r1 = [[1., 0., 0., 0., 0., 0.],
      [0., 0., 0., 1., 0., 0.],
      [0., 0., 0., 0., 1., 0.]]

a = np.array(r1)
np.argmax(a, axis=1)[:, None]

Output:

array([[0],
       [3],
       [4]])

Upvotes: 0

Francisco
Francisco

Reputation: 11496

For converting from representation 2 to representation 1, you can use something like keras.np_utils.to_categorical:

>>> y = [0, 1, 2]
>>> np_utils.to_categorical(y)
array([[ 1., 0., 0.],
       [ 0., 1., 0.],
       [ 0., 0., 1.]])

Upvotes: 0

Related Questions