noobie-php
noobie-php

Reputation: 7233

Replace duplicates with NAN in Pandas Series

i have a series where i want to replace the duplicated values in the series by NAN, or to replace them with empty string. following is my

data_dict = [{"Geo": "Canada"}, {"Geo": "Sri Lanka"}, {"Geo": "Lahore"}, {"Geo": "Karachi"}, {"Geo": "Islamabad"},
             {"Geo": "Other"}, {"Pipelines": "Sri Lanka"}, {"Pipelines": "Canada Exec"}, {"Pipelines": "USA SuperSA"},
             {"Pipelines": "Others"}]

df = pd.DataFrame(data_dict)
stacked_df = df.stack()
print(stacked_df)

the Series output is as follows:

0  Geo               Canada
1  Geo            Sri Lanka
2  Geo               Lahore
3  Geo              Karachi
4  Geo            Islamabad
5  Geo                Other
6  Pipelines      Sri Lanka
7  Pipelines    Canada Exec
8  Pipelines    USA SuperSA
9  Pipelines         Others
dtype: object

desired output is following without index

  Geo               Canada
                 Sri Lanka
                    Lahore
                   Karachi
                 Islamabad
                     Other
  Pipelines      Sri Lanka
               Canada Exec
               USA SuperSA
                    Others
dtype: object

Upvotes: 0

Views: 60

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150745

First, stack_df is not a data frame, it is a series. Second, Geo and Pipelines are in the index, not a normal column. That said, to obtain the desired output, I would do:

(stacked_df.reset_index(level=1)
    .assign(level_1=lambda x: x.level_1.mask(x.level_1.duplicated(),""))
)

Output:

     level_1            0
0        Geo       Canada
1               Sri Lanka
2                  Lahore
3                 Karachi
4               Islamabad
5                   Other
6  Pipelines    Sri Lanka
7             Canada Exec
8             USA SuperSA
9                  Others

Upvotes: 2

Related Questions