mpy
mpy

Reputation: 632

Subset a pandas datadrame based on a variable of pd.series type

I have a dataframe like below:

df = pd.DataFrame({"location": ["north", "south", "south", "north"], "store" : ["a", "b", "c", "d"], "12345" : [2,4,3,6], "23434": [4,3,7,0], "23454": [8,9,0,1], "65432": [4,6,5,7], "34254": [0,0,9,1]})

df

I would like to subset it by below variable:

var = pd.Series([12345, 65432, 34254])

var

the final output should look like below: desired output

my original dataframe and variable are big and I cannot use column names or use .loc and .iloc for sub setting, so any help would be appreciated.

Upvotes: 1

Views: 448

Answers (1)

jezrael
jezrael

Reputation: 863751

Convert values to strings, because columns names are strings by Index.astype and then to list and select by subset:

df = df[['location', 'store'] + var.astype(str).tolist()]
print (df)
  location store  12345  65432  34254
0    north     a      2      4      0
1    south     b      4      6      0
2    south     c      3      5      9
3    north     d      6      7      1

If possible some values in var not match add Index.intersection:

var = pd.Series([12345, 65432, 34254, 1000])

df = df[['location', 'store'] + df.columns.intersection(var.astype(str), sort=False).tolist()]
print (df)
  location store  12345  65432  34254
0    north     a      2      4      0
1    south     b      4      6      0
2    south     c      3      5      9
3    north     d      6      7      1

Another idea is create MultiIndex:

df = df.set_index(['location','store'])
df = df[var.astype(str).tolist()]
#if possible some values not match
#df = df[df.columns.intersection(var.astype(str), sort=False).tolist()]
print (df)
                12345  65432  34254
location store                     
north    a          2      4      0
south    b          4      6      0
         c          3      5      9
north    d          6      7      1

Upvotes: 2

Related Questions