Niccola Tartaglia
Niccola Tartaglia

Reputation: 1667

Subset pandas dataframe with lists

I would like to subset my dataframe based on a couple of lists of variables, that is:

 list1=[var1,var2,var3]
 list2=[var4,var5,var6]

 data_final = data[list1,list2]

which produced this error:

 TypeError: unhashable type: 'list'

If I provide a single list, everything works fine:

 data_final = data[list1]

Below is a min-example:

dict1 = [{'var0': 0, 'var1': 1, 'var2': 2},
     {'var0': 0, 'var1': 2, 'var2': 4},
    {'var0': 1, 'var1': 5, 'var2': 8},
    {'var0': 1, 'var1': 15, 'var2': 12},]
 df = pd.DataFrame(dict1, index=['s1', 's2','s3','s4'])

  list1=['var0']
  list2=['var1','var2']

These two commands work fine:

  df[list1]
  df[list2]

But this one produces the above mentioned error:

  df[list1,list2]

Upvotes: 3

Views: 2690

Answers (3)

PydPiper
PydPiper

Reputation: 498

To load any number of list into a dataframe in row (as long as the length of the lists are equal) you would do the following:

import pandas as pd
l1 = [1,2,3]
l2 = [10,20,30]
col_name = ['c1','c2','c3']
row_name = ['r1','r2']
pd.DataFrame([l1,l2],columns=col_name, index=row_name)

    c1  c2  c3
r1   1   2   3
r2  10  20  30

To load any number of lists into a dataframe in columns you would have to zip the list together like so:

l1 = [1,2,3]
l2 = [10,20,30]
col_name = ['c1','c2']
row_name = ['r1','r2','r3']
zipped_list = list(zip(l1,l2))

import pandas as pd

pd.DataFrame(zipped_list,columns=col_name,index=row_name)

    c1  c2
r1   1  10
r2   2  20
r3   3  30

Hope that helps, py-on!

Upvotes: 2

cosmic_inquiry
cosmic_inquiry

Reputation: 2684

Is this the output you're expecting?

df[list1 + list2]
Out[106]: 
    var0  var1  var2
s1     0     1     2
s2     0     2     4
s3     1     5     8
s4     1    15    12

Upvotes: 2

Neeraj Nair
Neeraj Nair

Reputation: 205

You need to write your column names in one list not as list of lists:

data_final= data[[var1,var2,var3],[var4,var5,var6]]

From docs:

You can pass a list of columns to [] to select columns in that order. If a column is not contained in the DataFrame, an exception will be raised. Multiple columns can also be set in this manner

Upvotes: 2

Related Questions