user3425989
user3425989

Reputation: 41

Deleting the contents of common columns in a dataframe

I have a Dataframe with 8 columns. Some rows only differ in certain columns. I would like to delete the contents for the repeated materials here is what I have

|C1|C2|C3|
|A |B |C |
|A |B |D |

here is what I want

|C1|C2|C3|
|A |B |C |
|  |  |D |

Upvotes: 2

Views: 86

Answers (3)

Andreas
Andreas

Reputation: 9197

You can use duplicated:

import pandas as pd
df = pd.DataFrame({'C1':['A','A'], 'C2':['B','B'], 'C3':['C', 'D']})

df = ~df.apply(pd.Series.duplicated) * df

Output:

  C1 C2 C3
0  A  B  C
1        D

Upvotes: 0

Gusti Adli
Gusti Adli

Reputation: 1213

You can iterate over the columns and use pandas' .duplicated() to filter the duplicated values and replace with them with empty strings.

for col in df.columns:
    df.loc[df[col].duplicated(), col] = ''

Alternatively you can wrap it in a function and use .apply()

def replace_duplicates(series):
    is_duplicated = series.duplicated()
    series[is_duplicated] = ''
    return series

df = df.apply(replace_duplicates)

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195418

Try:

mask = np.ravel(np.ones(df.shape, dtype=bool))
flat = np.ravel(df.values)
_, idx = np.unique(df, return_index=True)
mask[idx] = False
mask = mask.reshape(df.shape)
df[mask] = ""
print(df)

Prints:

  C1 C2 C3
0  A  B  C
1        D

Upvotes: 1

Related Questions