Contra111
Contra111

Reputation: 327

Pandas, copy rows whose names are repeated N times

Example data:

df1 = pd.DataFrame({
    'file': ['file1','file1','file1','file2','file2','file2','file3','file3','file4'],
    'prop1': [True,False,True,False,False,False,True,False,False],
    'prop2': [False,False,False,False,True,False,False,True,False],
    'prop3': [False,True,False,True,False,True,False,False,True]
})

I need to copy rows whose 'file' repeat 3 times, to get something like this:

file    prop1   prop2   prop3
0   file1   True    False   False
1   file1   False   False   True
2   file1   True    False   False
3   file2   False   False   True
4   file2   False   True    False
5   file2   False   False   True

Upvotes: 3

Views: 76

Answers (4)

ansev
ansev

Reputation: 30930

Also you can use DataFrame.set_index + DataFrame.loc:

new_df=df1.set_index('file').loc[df1.groupby('file').size().eq(3)]
print(new_df)

     prop1  prop2  prop3
file                      
file1   True  False  False
file1  False  False   True
file1   True  False  False
file2  False  False   True
file2  False   True  False
file2  False  False   True

Upvotes: 1

jezrael
jezrael

Reputation: 863166

Use GroupBy.transform for Series with same size like column, so possible filter by boolean indexing:

df = df1[df1.groupby('file')['file'].transform('size') == 3]

Detail:

print (df1.groupby('file')['file'].transform('size'))
0    3
1    3
2    3
3    3
4    3
5    3
6    2
7    2
8    1
Name: file, dtype: int64

Or use filtration:

df = df1.groupby('file').filter(lambda x: len(x) == 3)

Or use Series.map with Series.value_counts:

df = df1[df1['file'].map(df['file'].value_counts()) == 3]

print (df)
    file  prop1  prop2  prop3
0  file1   True  False  False
1  file1  False  False   True
2  file1   True  False  False
3  file2  False  False   True
4  file2  False   True  False
5  file2  False  False   True

Upvotes: 5

BENY
BENY

Reputation: 323316

IIUC transform

df=df1[df1.groupby('file')['file'].transform('count').eq(3)].copy() # esure you do not have copy warning for future modify .
    file  prop1  prop2  prop3
0  file1   True  False  False
1  file1  False  False   True
2  file1   True  False  False
3  file2  False  False   True
4  file2  False   True  False
5  file2  False  False   True

Upvotes: 2

yatu
yatu

Reputation: 88276

You could groupby the file name, transform with the size and use the result to index the dataframe:

df1[df1.groupby('file').prop1.transform('size').eq(3)]

   file   prop1  prop2  prop3
0  file1   True  False  False
1  file1  False  False   True
2  file1   True  False  False
3  file2  False  False   True
4  file2  False   True  False
5  file2  False  False   True

Upvotes: 3

Related Questions