Marc Niet
Marc Niet

Reputation: 13

How do I filter a certain column, removing repeated data?

I want to return the weights to a histogram, but with the names only appearing once.

df = pd.DataFrame({'Name': ['Bob', 'Simon', 'Bill', 'Mary', 'Mary', 'Bob'],
                   'Weight': [70, 72, 71, 67, 67, 70]})

This:

Bob    70
Simon  72
Bill   71
Mary   67 

Upvotes: 0

Views: 39

Answers (4)

Muhammad Mohsin Khan
Muhammad Mohsin Khan

Reputation: 1476

Do the following:

df = df.drop_duplicates(subset=['Name', 'Weight'])
print(df)

Output:

>>>  Name  Weight
0    Bob      70
1  Simon      72
2   Bill      71
3   Mary      67

Upvotes: 0

Rajarshi Ghosh
Rajarshi Ghosh

Reputation: 462

We can use groupby function with aggregate function as mean

The data looks like this

>>> df = pd.DataFrame({'Name': ['Bob', 'Simon', 'Bill', 'Mary', 'Mary', 'Bob'], 'Weight': [70, 72, 71, 67, 67, 70]})
>>> print(df)

Name  Weight
0    Bob      70
1  Simon      72
2   Bill      71
3   Mary      67
4   Mary      67
5    Bob      70

>>> df2 = df.groupby(['Name']).mean()

>>> print(df2)

Name  Weight
0   Bill      71
1    Bob      70
2   Mary      67
3  Simon      72

Convert Name index column to a column and add a RangeIndex

>>> df2['Name'] = df2.index
>>> df2 = df2[['Name', 'Weight']]
>>> df2.set_index(pd.RangeIndex(start=0, stop=len(df2), step=1), inplace=True)
>>> print(df2)

Name  Weight
0   Bill      71
1    Bob      70
2   Mary      67
3  Simon      72

Upvotes: 0

Code Different
Code Different

Reputation: 93161

You need a groupby:

df.groupby('Name')['Weight'].mean()

If you want to take just the first data point available for each name:

df.groupby('Name')['Weight'].first()

Upvotes: 0

Corralien
Corralien

Reputation: 120429

Use drop_duplicates:

out = df.drop_duplicates(['Name', 'Weight'])
print(out)

# Output
    Name  Weight
0    Bob      70
1  Simon      72
2   Bill      71
3   Mary      67

Upvotes: 1

Related Questions