Jasmine N
Jasmine N

Reputation: 91

get a set of unique values from nested list

I have this nested list X_train

X_train = [['sunny', 'hot', 'high', 'FALSE'],
 ['sunny', 'hot', 'high', 'TRUE'],
 ['overcast', 'hot', 'high', 'FALSE'],
 ['rainy', 'mild', 'high', 'FALSE'],
 ['rainy', 'cool', 'normal', 'FALSE'],
 ['rainy', 'cool', 'normal', 'TRUE'],
 ['overcast', 'cool', 'normal', 'TRUE'],
 ['sunny', 'mild', 'high', 'FALSE'],
 ['sunny', 'cool', 'normal', 'FALSE'],
 ['rainy', 'mild', 'normal', 'FALSE'],
 ['sunny', 'mild', 'normal', 'TRUE'],
 ['overcast', 'mild', 'high', 'TRUE'],
 ['overcast', 'hot', 'normal', 'FALSE'],
 ['rainy', 'mild', 'high', 'TRUE']]

I want to generate a list where the nth row of X_train contains the set of unique values in the 𝑛 th column of X_train. So the expected output should be:

[{'overcast', 'rainy', 'sunny'},
 {'cool', 'hot', 'mild'},
 {'high', 'normal'},
 {'FALSE', 'TRUE'}]

My code is as follows:

questions=[]
f=set({w for row in X_train for w in row})
questions+=[f]

The output for that is like the gatherings of all unique values, which is not my expected output. How should I correct to fix my output as expected (I am advised to use set but I am not sure how to fix it in a right way)

[{'FALSE',
  'TRUE',
  'cool',
  'high',
  'hot',
  'mild',
  'normal',
  'overcast',
  'rainy',
  'sunny'}]

Any ideas to help me out please? Thanks in advance

Upvotes: 0

Views: 461

Answers (3)

nishant jha
nishant jha

Reputation: 1

if you dont want to use zip then you can use this method but it is very long but simple and very basic

X_train = [['sunny', 'hot', 'high', 'FALSE'],
['sunny', 'hot', 'high', 'TRUE'],
['overcast', 'hot', 'high', 'FALSE'],
['rainy', 'mild', 'high', 'FALSE'],
['rainy', 'cool', 'normal', 'FALSE'],
['rainy', 'cool', 'normal', 'TRUE'],
['overcast', 'cool', 'normal', 'TRUE'],
['sunny', 'mild', 'high', 'FALSE'],
['sunny', 'cool', 'normal', 'FALSE'],
['rainy', 'mild', 'normal', 'FALSE'],
['sunny', 'mild', 'normal', 'TRUE'],
['overcast', 'mild', 'high', 'TRUE'],
['overcast', 'hot', 'normal', 'FALSE'],
['rainy', 'mild', 'high', 'TRUE']]
f=[]
temp1=set()
temp2=set()
temp3=set()
temp4=set()
for i in X_train:
    temp1.add(i[0])
    temp2.add(i[1])
    temp3.add(i[2])
    temp4.add(i[3])
f.append(temp1)
f.append(temp2)
f.append(temp3)
f.append(temp4)
del(temp1)
del(temp2)
del(temp3)
del(temp4)
print(f)

Upvotes: 0

Oli
Oli

Reputation: 2602

A concise way of getting your expected output is with: list(map(set, zip(*X_train))).

zip(*X_train) switches rows and columns, giving something roughly equivalent to:

[['sunny',
  'sunny',
  'overcast',
  'rainy',
  'rainy',
  'rainy',
  'overcast',
  'sunny',
  'sunny',
  'rainy',
  'sunny',
  'overcast',
  'overcast',
  'rainy'],
 ['hot',
  'hot',
  'hot',
  'mild',
  'cool',
  'cool',
  'cool',
  'mild',
  'cool',
  'mild',
  'mild',
  'mild',
  'hot',
  'mild'],
 ['high',
  'high',
  'high',
  'high',
  'normal',
  'normal',
  'normal',
  'high',
  'normal',
  'normal',
  'normal',
  'high',
  'normal',
  'high'],
 ['FALSE',
  'TRUE',
  'FALSE',
  'FALSE',
  'FALSE',
  'TRUE',
  'TRUE',
  'FALSE',
  'FALSE',
  'FALSE',
  'TRUE',
  'TRUE',
  'FALSE',
  'TRUE']]

Then each list in the list is mapped to a set, and the map object is converted to a list.

Upvotes: 0

Mark
Mark

Reputation: 92440

You can zip() the list to get the columns. Unpacking the columns with * is the trick here. Then just take sets of the columns:

X_train = [['sunny', 'hot', 'high', 'FALSE'],
 ['sunny', 'hot', 'high', 'TRUE'],
 ['overcast', 'hot', 'high', 'FALSE'],
 ['rainy', 'mild', 'high', 'FALSE'],
 ['rainy', 'cool', 'normal', 'FALSE'],
 ['rainy', 'cool', 'normal', 'TRUE'],
 ['overcast', 'cool', 'normal', 'TRUE'],
 ['sunny', 'mild', 'high', 'FALSE'],
 ['sunny', 'cool', 'normal', 'FALSE'],
 ['rainy', 'mild', 'normal', 'FALSE'],
 ['sunny', 'mild', 'normal', 'TRUE'],
 ['overcast', 'mild', 'high', 'TRUE'],
 ['overcast', 'hot', 'normal', 'FALSE'],
 ['rainy', 'mild', 'high', 'TRUE']]

values = [set(col) for col in zip(*X_train)]

Gives you values:

[{'overcast', 'rainy', 'sunny'},
 {'cool', 'hot', 'mild'},
 {'high', 'normal'},
 {'FALSE', 'TRUE'}]

Upvotes: 7

Related Questions