Ryandcalvert
Ryandcalvert

Reputation: 25

How do I pass multiple column names as val_vars using melt?

I have a large data frame (367 rows × 342 columns) where multiple columns have the same prefix in their name. I am trying to make our code easier to use.

Current code:

                   value_vars = "'Intensity 01_1',
 'Intensity 01_2',
 'Intensity 01_3',
 'Intensity 03_1',
 'Intensity 03_2',
 'Intensity 03_3',
 'Intensity 04_1',
 'Intensity 04_2',
 'Intensity 04_3',
 'Intensity 05_1',
 'Intensity 05_2',
 'Intensity 05_3',
 'Intensity 06_1',
 'Intensity 06_2',
 'Intensity 06_3',,
                    var_name="SampleMeas", value_name="SpecInt"
                               )

Here is what I am trying to use but I am getting an error " TypeError: unhashable type: 'list' "

valvarlist = [col for col in protstack if 'Intensity' in col], 
[col for col in protstack if 'iBAQ' in col], 
[col for col in protstack if 'LFQ intensity' in col]
#print(valvarlist)

test = pd.melt(protstack, id_vars="Majority protein IDs", 
                   value_vars = valvarlist,
                    var_name="SampleMeas", value_name="SpecInt"
                               )

I have tried putting the valvarlist in [] but I get the same error. When I check type(valvarlist), I get a tuble, which should be usable with melt.

Upvotes: 1

Views: 236

Answers (1)

jezrael
jezrael

Reputation: 862591

Create list of columns names with or for chain conditions:

alvarlist = [col for col in protstack if 
                      ('Intensity' in col) or ('iBAQ' in col) or ('intensity' in col)]

Or use str.contains with columns names with | for regex OR of tested values:

alvarlist = df.columns[df.columns.str.contains('Intensity|iBAQ|intensity')]

Sample:

df = pd.DataFrame(1, columns=['Intensity1','iBAQ1','intensity4','intensity','ss'],
                   index=[0,1])
print (df)
   Intensity1  iBAQ1  intensity4  intensity  ss
0           1      1           1          1   1
1           1      1           1          1   1

protstack = df.columns
alvarlist = [col for col in protstack if 
                      ('Intensity' in col) or ('iBAQ' in col) or ('intensity' in col)]
print (alvarlist)
['Intensity1', 'iBAQ1', 'intensity4', 'intensity']

alvarlist = df.columns[df.columns.str.contains('Intensity|iBAQ|intensity')]
print (alvarlist)
Index(['Intensity1', 'iBAQ1', 'intensity4', 'intensity'], dtype='object')

Upvotes: 2

Related Questions