SMS
SMS

Reputation: 382

Print column information for a pandas frame with multiple header

I have a question hoping you can help me out here. I have an excel-file with multiple header, which I read following way

df=pd.read_excel(path,sheet_name="name",header=[1,2,3,4,5,6]

The original excel files looks like this

Required                         1 - Person
Legacy Contact Person Number     Business partner category
Central Data                     Central Data
Header Data                      Header Data
CHAR                             CHAR 
10                               1
PARTNER                          TYPE 
Test 1                           Name 1 
Test 2                           Name 2
Test 3                           Name 3
Test 4                           Name 4
Test 5                           Name 5

Problem: I am trying to find an easy way to have a following statement

df["PARTNER"]=df["PARTNER"].fillna("Missing")

I realized that there is an issue by printing the command

print(df["PARTNER"])

Question: Is there any way on how to print a column in a frame with multiple header and select the header I am interested in, in this case "PARTNER"?

Upvotes: 1

Views: 372

Answers (1)

Stef
Stef

Reputation: 30579

To select (print) data from a particular level of a MultiIndex you can use xs:

df = pd.DataFrame({('Legacy Contact Person Number', 'Central Data', 'Header Data', 'CHAR', '10', 'PARTNER'): ['Test 1', None, 'Test 3', 'Test 4', 'Test 5'],
                   ('Business partner category', 'Central Data', 'Header Data', 'CHAR', '1', 'TYPE'): ['Name 1', 'Name 2', 'Name 3', 'Name 4', 'Name 5']})

print(df.xs(('PARTNER'), level=5, axis=1, drop_level=False))

Output:

  Legacy Contact Person Number
                  Central Data
                   Header Data
                          CHAR
                            10
                       PARTNER
0                       Test 1
1                         None
2                       Test 3
3                       Test 4
4                       Test 5

However, you can't use xs to set data. This is possible with an IndexSlice:

idx = pd.IndexSlice
df.loc[:, idx[:, :, :, :, :, 'PARTNER']] = df.loc[:, idx[:, :, :, :, :, 'PARTNER']].fillna('Missing')

Result:

  Legacy Contact Person Number Business partner category
                  Central Data              Central Data
                   Header Data               Header Data
                          CHAR                      CHAR
                            10                         1
                       PARTNER                      TYPE
0                       Test 1                    Name 1
1                      Missing                    Name 2
2                       Test 3                    Name 3
3                       Test 4                    Name 4
4                       Test 5                    Name 5

Instead of df.loc[:, idx[:, :, :, :, :, 'PARTNER']] you can also use df.loc(axis=1)[:, :, :, :, :, 'PARTNER'].

Upvotes: 1

Related Questions