Lee
Lee

Reputation: 79

Pandas How to hide the header when the output need header as a condition to filter the result?

Solution

If you hide the header of the output. use the header = None option. And notice that just use it when you are going to print it. For the reason that if you set the header = None when you load the data, the name of the column is unusable to you, so you can't use it to filter the data or do something else.

For example :

print(ResDf.to_string(header = None))

Update

The output I want has no header. For example the output is

  0   1  2   3  4   5  6   7  8   9   10 11 <------------------column name
3  b   7  a   4  b   2  b   6  b NaN  10  b
8  a   8  a   6  b   2  c   4  a NaN  10  c

The output I want is

------------without column name -------------------

3  b   7  a   4  b   2  b   6  b NaN  10  b
8  a   8  a   6  b   2  c   4  a NaN  10  c

But it can't be done by using header = none, so I wonder how to make it?

Thing is that if you set the header = None option, the column name can't be used as a condition to filter the data. Cuz there is no column name already. For example I set the filter ( or called mask) as mask = df[u'客户'].str.contains(Client, na=False) & df[u'型号'].str.contains(GoodsType, na=False). If you set the header = None I think that there is no 型号 or 客户 in the dataframe, so It can't be used. So how to hide header when you still want to use the header to filter the output data?

I want the pandas output without the header, but the output needs the header to filtered.

Here is my code, I knew the trick to set the header=None, but I can't do that because the header still is used as a condition to filter the output. For example here I want the output with the '客户'(which is a column name) contain the certain word 'Tom'(For example). If I use the header = None option, the '客户' will not be recognized. So how to get the output without the header, in my condition?

# -*- coding: utf-8 -*-
# -*- coding: gbk -*-
import pandas as pd
import numpy as np
import sys
import re
import os

import sys

Client    = sys.argv[1]
GoodsType = sys.argv[2]
Weight    = sys.argv[3]

script_dir = os.path.dirname(os.path.abspath(__file__))
os.chdir(script_dir ) # change to the path that you already know

pd.set_option('display.max_columns', 1000)
# df = pd.read_excel("packagesum.xlsx", header = None) # '客户' will not be recognized when set the header to None
df = pd.read_excel("packagesum.xlsx")
# print(str(df.ix[:,u'客户经理':u'内袋标贴'][df[u'客户'].str.contains(Client, na = False)][df[u'型号'].str.contains(GoodsType, na = False)]))
ResDf = df.ix[:,u'客户经理':u'留样'][df[u'客户'].str.contains(Client, na = False)][df[u'型号'].str.contains(GoodsType, na = False)]
ResDf[u'重量'] = Weight
print(str(ResDf))
with open('GoodsTypeRes.txt', 'w') as the_file:
   the_file.write(str(ResDf))

This is the header of my excel file.

enter image description here

Upvotes: 1

Views: 3654

Answers (1)

jezrael
jezrael

Reputation: 863031

I think you need parameter names for set column names if does not exist, also header = None can be omit:

#change column names by your data
df = pd.read_excel("packagesum.xlsx", names=['col1','col2','col3', ...])

And then code can be simplify by boolean indexing with DataFrame.to_csv:

mask = df[u'客户'].str.contains(Client, na=False) & df[u'型号'].str.contains(GoodsType, na=False)
ResDf = df.loc[mask,u'客户经理':u'留样']
ResDf[u'重量'] = Weight


ResDf.to_csv('GoodsTypeRes.tx', header=False)

Another solution is select columns by position with iloc.

df = pd.read_excel("packagesum.xlsx", header=None)

#check positions if corrects, python starts from 0 for first position
mask = df.iloc[:, 2].str.contains(Client, na=False) & df.iloc[:, 4].str.contains(GoodsType, na=False)
#all columns
ResDf = df[mask].copy()
#add new column to position 10 what is same as column name
ResDf[10] = Weight


ResDf.to_csv('GoodsTypeRes.tx', header=False)

Sample:

np.random.seed(345)
N = 10
df = pd.DataFrame({0:np.random.choice(list('abc'), size=N),
                   1:np.random.choice([8,7,0], size=N),
                   2:np.random.choice(list('abc'), size=N),
                   3:np.random.randint(10, size=N),
                   4:np.random.choice(list('abc'), size=N),
                   5:np.random.choice([2,0], size=N),
                   6:np.random.choice(list('abc'), size=N),
                   7:np.random.randint(10, size=N),
                   8:np.random.choice(list('abc'), size=N),
                   9:np.random.choice([np.nan,0], size=N),
                   10:np.random.choice([1,0], size=N),
                   11:np.random.choice(list('abc'), size=N)})
print (df)
  0   1  2   3  4   5  6   7  8    9   10 11
0  a   7  b   6  a   2  a   7  c  0.0   1  b
1  a   8  b   3  b   0  a   7  a  NaN   0  b
2  b   8  b   3  b   2  a   8  c  NaN   1  b
3  b   7  a   4  b   2  b   6  b  NaN   0  b
4  c   0  b   2  c   2  c   7  a  NaN   1  b
5  a   0  a   8  c   2  b   1  c  NaN   1  b
6  a   8  b   5  c   2  a   5  a  0.0   0  a
7  b   8  a   2  c   0  a   1  a  NaN   1  c
8  a   8  a   6  b   2  c   4  a  NaN   0  c
9  c   0  b   2  a   0  b   2  c  0.0   0  b

Client = 'a'
GoodsType = 'b'
Weight = 10

mask = df.iloc[:, 2].str.contains(Client, na=False) & df.iloc[:, 4].str.contains(GoodsType, na=False)
ResDf = df[mask].copy()

ResDf[10] = Weight
print (ResDf)
  0   1  2   3  4   5  6   7  8   9   10 11
3  b   7  a   4  b   2  b   6  b NaN  10  b
8  a   8  a   6  b   2  c   4  a NaN  10  c

Upvotes: 2

Related Questions