Kenneth Chan
Kenneth Chan

Reputation: 47

How to use python pandas to combine the same name into one, without removing other columns

Here is the example

    name        year     date      start opp
0   A.J. Price  2015     2014-12-02 No  MIL
1   A.J. Price  2015     2014-12-04 No  NYK
2   A.J. Price  2015     2014-12-05 No  TOR
3   A.J. Price  2015     2014-12-08 No  BRK
4   A.J. Price  2015     2014-12-09 No  TOR
318 Aaron       2015     2014-12-15 No  ATL
319 Aaron       2015     2014-12-18 No  NYK
320 Aaron       2015     2014-12-19 No  MEM

How to make the data frame above into something hierarchy like below

0   A.J. Price  2015     2014-12-02 No  MIL
                2015     2014-12-04 No  NYK
                2015     2014-12-05 No  TOR 
                2015     2014-12-08 No  BRK
                2015     2014-12-09 No  TOR  
318 Aaron       2015     2014-12-15 No  ATL
                2015     2014-12-18 No  NYK
                2015     2014-12-19 No  MEM

Upvotes: 0

Views: 299

Answers (1)

Abbas
Abbas

Reputation: 4069

With the help given by EdChum here is how it can be done:

In [11]: df
Out[11]:
   name         year       date        start  opp
0  A.J. Price      2015    2014-12-02   No    MIL
1  A.J. Price      2015    2014-12-04   No    NYK
2  A.J. Price      2015    2014-12-05   No    TOR
3  A.J. Price      2015    2014-12-08   No    BRK
4  A.J. Price      2015    2014-12-09   No    TOR
5  Aaron           2015    2014-12-15   No    ATL
6  Aaron           2015    2014-12-18   No    NYK
7  Aaron           2015    2014-12-19   No    MEM

In [12]: df.set_index('name',inplace=True)
In [13]: df.set_index('year',append=True, inplace=True)

In [14]: df
Out[14]:
                          date  start  opp
name        year
A.J. Price  2015    2014-12-02   No    MIL
            2015    2014-12-04   No    NYK
            2015    2014-12-05   No    TOR
            2015    2014-12-08   No    BRK
            2015    2014-12-09   No    TOR
Aaron       2015    2014-12-15   No    ATL
            2015    2014-12-18   No    NYK
            2015    2014-12-19   No    MEM

Update:

While writing out multi-index tables (or pivot_tables) to_csv will write the hierarchical index of each row, resulting in data as follows:

enter image description here

However using to_excel will write the output like this by merging the rows of hierarchical index:

enter image description here

So if the concern is how to get back the multi-index while reading the csv back, use the index_col argument of the function:

pd.read_csv('input.csv', index_col=[0,1])

Here is another link that will help you to write the csv the way you wanted.

Upvotes: 1

Related Questions