Reputation: 3196
I have a DataFrame
like below:
VID value
1 1 xx
2 2 xx1
3 2 xx2
4 2 xx3
5 2 xx4
6 3 xx
7 3 xx
8 3 xx
9 4 zz1
10 4 zz2
11 4 zz3
12 4 zz4
13 4 zz5
I want to keep only the rows whose VID exists in more than n=3 cases. In the above example, I'd keep only rows 2-5 and 9-13 (since only the VIDs 2 and 4 appear more than 3 times). The 'value' field is irrelevant.
After keeping the needed rows, I'd like to convert my data as a list of lists while appending an 'end' value at the end of each one:
[[xx1, xx2, xx3, xx4, 'end'], [zz1, zz2, zz3, zz4, zz5, 'end']]
Is it possible to get the above without a for
loop?
Upvotes: 6
Views: 2917
Reputation: 90979
You can group by VID
column and then take count of each group. Then after that use it for indexing your original df, so as to only take rows that have greater than 3 count. Example -
countdf = df.groupby('VID').count()
result = df.loc[df['VID'].isin(countdf[countdf['value'] > 3].index)]
Demo -
In [49]: df
Out[49]:
VID value
1 1 xx
2 2 xx1
3 2 xx2
4 2 xx3
5 2 xx4
6 3 xx
7 3 xx
8 3 xx
9 4 zz1
10 4 zz2
11 4 zz3
12 4 zz4
13 4 zz5
In [51]: df.groupby('VID').count()
Out[51]:
value
VID
1 1
2 4
3 3
4 5
In [52]: countdf = df.groupby('VID').count()
In [53]: df.loc[df['VID'].isin(countdf[countdf['value'] > 3].index)]
Out[53]:
VID value
2 2 xx1
3 2 xx2
4 2 xx3
5 2 xx4
9 4 zz1
10 4 zz2
11 4 zz3
12 4 zz4
13 4 zz5
Then after this, you can again group based on VID
and then convert the groups into list
and then again to list , Example -
resultlist = result.groupby('VID')['value'].apply(list).tolist()
Demo -
In [54]: result = df.loc[df['VID'].isin(countdf[countdf['value'] > 3].index)]
In [55]: result.groupby('VID')['value'].apply(list).tolist()
Out[55]: [['xx1', 'xx2', 'xx3', 'xx4'], ['zz1', 'zz2', 'zz3', 'zz4', 'zz5']]
Please note, above would not get you the 'end'
value in the list, I am guessing that is not necessary, but if you really want that, you can manually add it after getting the list. Example -
resultlist = [elem + ['end'] for elem in resultlist]
Upvotes: 6