Reputation: 7233
i have a series where i want to replace the duplicated values in the series by NAN, or to replace them with empty string. following is my
data_dict = [{"Geo": "Canada"}, {"Geo": "Sri Lanka"}, {"Geo": "Lahore"}, {"Geo": "Karachi"}, {"Geo": "Islamabad"},
{"Geo": "Other"}, {"Pipelines": "Sri Lanka"}, {"Pipelines": "Canada Exec"}, {"Pipelines": "USA SuperSA"},
{"Pipelines": "Others"}]
df = pd.DataFrame(data_dict)
stacked_df = df.stack()
print(stacked_df)
the Series output is as follows:
0 Geo Canada
1 Geo Sri Lanka
2 Geo Lahore
3 Geo Karachi
4 Geo Islamabad
5 Geo Other
6 Pipelines Sri Lanka
7 Pipelines Canada Exec
8 Pipelines USA SuperSA
9 Pipelines Others
dtype: object
desired output is following without index
Geo Canada
Sri Lanka
Lahore
Karachi
Islamabad
Other
Pipelines Sri Lanka
Canada Exec
USA SuperSA
Others
dtype: object
Upvotes: 0
Views: 60
Reputation: 150745
First, stack_df
is not a data frame, it is a series. Second, Geo
and Pipelines
are in the index, not a normal column. That said, to obtain the desired output, I would do:
(stacked_df.reset_index(level=1)
.assign(level_1=lambda x: x.level_1.mask(x.level_1.duplicated(),""))
)
Output:
level_1 0
0 Geo Canada
1 Sri Lanka
2 Lahore
3 Karachi
4 Islamabad
5 Other
6 Pipelines Sri Lanka
7 Canada Exec
8 USA SuperSA
9 Others
Upvotes: 2