Reputation: 327
Example data:
df1 = pd.DataFrame({
'file': ['file1','file1','file1','file2','file2','file2','file3','file3','file4'],
'prop1': [True,False,True,False,False,False,True,False,False],
'prop2': [False,False,False,False,True,False,False,True,False],
'prop3': [False,True,False,True,False,True,False,False,True]
})
I need to copy rows whose 'file' repeat 3 times, to get something like this:
file prop1 prop2 prop3
0 file1 True False False
1 file1 False False True
2 file1 True False False
3 file2 False False True
4 file2 False True False
5 file2 False False True
Upvotes: 3
Views: 76
Reputation: 30930
Also you can use DataFrame.set_index + DataFrame.loc:
new_df=df1.set_index('file').loc[df1.groupby('file').size().eq(3)]
print(new_df)
prop1 prop2 prop3
file
file1 True False False
file1 False False True
file1 True False False
file2 False False True
file2 False True False
file2 False False True
Upvotes: 1
Reputation: 863166
Use GroupBy.transform
for Series with same size like column, so possible filter by boolean indexing
:
df = df1[df1.groupby('file')['file'].transform('size') == 3]
Detail:
print (df1.groupby('file')['file'].transform('size'))
0 3
1 3
2 3
3 3
4 3
5 3
6 2
7 2
8 1
Name: file, dtype: int64
Or use filtration
:
df = df1.groupby('file').filter(lambda x: len(x) == 3)
Or use Series.map
with Series.value_counts
:
df = df1[df1['file'].map(df['file'].value_counts()) == 3]
print (df)
file prop1 prop2 prop3
0 file1 True False False
1 file1 False False True
2 file1 True False False
3 file2 False False True
4 file2 False True False
5 file2 False False True
Upvotes: 5
Reputation: 323316
IIUC transform
df=df1[df1.groupby('file')['file'].transform('count').eq(3)].copy() # esure you do not have copy warning for future modify .
file prop1 prop2 prop3
0 file1 True False False
1 file1 False False True
2 file1 True False False
3 file2 False False True
4 file2 False True False
5 file2 False False True
Upvotes: 2
Reputation: 88276
You could groupby
the file
name, transform with the size
and use the result to index the dataframe:
df1[df1.groupby('file').prop1.transform('size').eq(3)]
file prop1 prop2 prop3
0 file1 True False False
1 file1 False False True
2 file1 True False False
3 file2 False False True
4 file2 False True False
5 file2 False False True
Upvotes: 3