Reputation: 567
This question is actually about 2 cases:
1. the dataframe part
for the dataframe
df = pd.DataFrame({'Company': ['Huawei','Huawei','Huawei', 'Apple', 'Apple', 'Samsung', 'Samsung'],
'Year': [2011, 2011, 2018, 2011, 2019, 2018, 2019],
'Product': ['H1', 'H2', 'H3', 'A1', 'A2', 'S1', 'S2']})
df = df.sort_values(by=['Company', 'Year'])
df
i.e.
Company Year Product 3 Apple 2011 A1 4 Apple 2019 A2 0 Huawei 2011 H1 1 Huawei 2011 H2 2 Huawei 2018 H3 5 Samsung 2018 S1 6 Samsung 2019 S2
what I need is mergeCell(df, on = ['Company'])
returns
Company Year Product 3 Apple 2011 A1 4 2019 A2 0 Huawei 2011 H1 1 2011 H2 2 2018 H3 5 Samsung 2018 S1 6 2019 S2
while mergeCell(df, on = ['Company', 'Year'])
returns
Company Year Product 3 Apple 2011 A1 4 Apple 2019 A2 0 Huawei 2011 H1 1 H2 2 Huawei 2018 H3 5 Samsung 2018 S1 6 Samsung 2019 S2
I wrote one, but obviously it is not terse and has bug
def mergeCell(df, on):
import copy
dfMerged = df[on]
dfTmp = np.empty((df.shape[0], len(on)), dtype=object)
lastRow = ()
idx = 0
for row in dfMerged.itertuples():
if idx == 0:
lastRow = row[1:]
dfTmp[idx, :] = lastRow
else:
if row[1:] != lastRow:
lastRow = row[1:]
dfTmp[idx, :] = lastRow
else:
dfTmp[idx, :] = np.empty((1, len(on)), dtype=object)
idx += 1
dfTmp = pd.DataFrame(dfTmp)
dfTmp.columns = on
dfCopied = copy.deepcopy(df)
for idxRow in range(df.shape[0]):
for idxCol in on:
dfCopied.loc[idxRow, idxCol] = dfTmp.loc[idxRow, idxCol]
return dfCopied
so, is there a built-in way to do so?
2. save the result dataframe to excel with merged cell and the text is on vertical center
for this part, I have no idea except for doing what I have do in the above mergeCell
function
thanks
Upvotes: 0
Views: 425
Reputation: 164693
So, is there a built-in way to do so?
Yes, you can use use duplicated
. But be careful, an "empty cell" in Pandas may mean one of two things: either NaN
or the empty string ''
. Since you are concerned about presentation, I assume you want the latter.
Example 1: pd.Series.duplicated
col = 'Company'
df[col] = df[col].mask(df[col].duplicated(), '')
print(df)
# Company Year Product
# 3 Apple 2011 A1
# 4 2019 A2
# 0 Huawei 2011 H1
# 1 2011 H2
# 2 2018 H3
# 5 Samsung 2018 S1
# 6 2019 S2
Example 2: pd.DataFrame.duplicated
cols = ['Company', 'Year']
df[cols] = df[cols].mask(df[cols].duplicated(), '')
print(df)
# Company Year Product
# 3 Apple 2011 A1
# 4 Apple 2019 A2
# 0 Huawei 2011 H1
# 1 H2
# 2 Huawei 2018 H3
# 5 Samsung 2018 S1
# 6 Samsung 2019 S2
Upvotes: 1