Reputation: 570
I currently have the following DataFrame with an id and a column called "childOrParent". A group cannot have children without Parents.
+----+---------------+
| id | childOrParent |
+----+---------------+
| 1 | Parent |
| 1 | child |
| 2 | Parent |
| 3 | child |
| 3 | child |
| 3 | Parent |
+----+---------------+
How do I check to see if the DataFrame is valid? If there is an id group were there is only children, then I need to know the id.
ex) the following dataframe would be invalid and I need to know that it is id: 3
+----+---------------+
| id | childOrParent |
+----+---------------+
| 1 | Parent |
| 1 | child |
| 2 | Parent |
| 3 | child |
| 3 | child |
| 3 | child |
+----+---------------+
I've tried to get only the counts of children or parent within a group and then merge the two DataFrames but that doesn't seem to be right.
Upvotes: 0
Views: 31
Reputation: 323326
Using groupby
with filter
+ all
df.groupby('id').filter(lambda x : (x['childOrParent']=='child').all())
Out[383]:
id childOrParent
3 3 child
4 3 child
5 3 child
df.groupby('id').filter(lambda x : (x['childOrParent']=='child').all()).id.unique()
Out[384]: array([3], dtype=int64)
Upvotes: 2