oyster
oyster

Reputation: 567

pandas - how to make same row blank then write to excel with merged cells

This question is actually about 2 cases:

  1. make the same value in column of pandas dataframe to be blank
  2. save the result dataframe to excel with merged cell and the text is on vertical center

1. the dataframe part

for the dataframe

df = pd.DataFrame({'Company': ['Huawei','Huawei','Huawei', 'Apple', 'Apple', 'Samsung', 'Samsung'],
                    'Year': [2011, 2011, 2018, 2011, 2019, 2018, 2019],
                    'Product': ['H1', 'H2', 'H3', 'A1', 'A2', 'S1', 'S2']})
df = df.sort_values(by=['Company', 'Year'])
df

i.e.

   Company    Year  Product
3   Apple       2011   A1
4   Apple       2019   A2
0   Huawei      2011   H1
1   Huawei      2011   H2
2   Huawei      2018   H3
5   Samsung     2018   S1
6   Samsung     2019   S2

what I need is mergeCell(df, on = ['Company']) returns

   Company    Year  Product
3   Apple       2011   A1
4               2019   A2
0   Huawei      2011   H1
1               2011   H2
2               2018   H3
5   Samsung     2018   S1
6               2019   S2

while mergeCell(df, on = ['Company', 'Year']) returns

   Company    Year  Product
3   Apple       2011   A1
4   Apple       2019   A2
0   Huawei      2011   H1
1                      H2
2   Huawei      2018   H3
5   Samsung     2018   S1
6   Samsung     2019   S2

I wrote one, but obviously it is not terse and has bug

def mergeCell(df, on):
    import copy

    dfMerged = df[on]

    dfTmp = np.empty((df.shape[0], len(on)), dtype=object)
    lastRow = ()
    idx = 0
    for row in dfMerged.itertuples():
        if idx == 0:
            lastRow = row[1:]
            dfTmp[idx, :] = lastRow
        else:
            if row[1:] != lastRow:
                lastRow = row[1:]
                dfTmp[idx, :] = lastRow
            else:
                dfTmp[idx, :] = np.empty((1, len(on)), dtype=object)

        idx += 1 

    dfTmp = pd.DataFrame(dfTmp)
    dfTmp.columns = on


    dfCopied = copy.deepcopy(df)
    for idxRow in range(df.shape[0]):
        for idxCol in on:
            dfCopied.loc[idxRow, idxCol] = dfTmp.loc[idxRow, idxCol]

    return dfCopied     

so, is there a built-in way to do so?

2. save the result dataframe to excel with merged cell and the text is on vertical center for this part, I have no idea except for doing what I have do in the above mergeCell function

thanks

Upvotes: 0

Views: 425

Answers (1)

jpp
jpp

Reputation: 164693

So, is there a built-in way to do so?

Yes, you can use use duplicated. But be careful, an "empty cell" in Pandas may mean one of two things: either NaN or the empty string ''. Since you are concerned about presentation, I assume you want the latter.

Example 1: pd.Series.duplicated

col = 'Company'
df[col] = df[col].mask(df[col].duplicated(), '')

print(df)

#    Company  Year Product
# 3    Apple  2011      A1
# 4           2019      A2
# 0   Huawei  2011      H1
# 1           2011      H2
# 2           2018      H3
# 5  Samsung  2018      S1
# 6           2019      S2

Example 2: pd.DataFrame.duplicated

cols = ['Company', 'Year']
df[cols] = df[cols].mask(df[cols].duplicated(), '')

print(df)

#    Company  Year Product
# 3    Apple  2011      A1
# 4    Apple  2019      A2
# 0   Huawei  2011      H1
# 1                     H2
# 2   Huawei  2018      H3
# 5  Samsung  2018      S1
# 6  Samsung  2019      S2

Upvotes: 1

Related Questions