Reputation: 2333
I have a dataframe that looks like this:
ID AgeGroups PaperIDs
0 1 [3, 3, 10] [A, B, C]
1 2 [5] [D]
2 3 [4, 12] [A, D]
3 4 [2, 6, 13, 12] [X, Z, T, D]
I would like the extract the rows where the list in the AgeGroups
column has at least 2 values less than 7 and at least 1 value greater than 8.
So the result should look like this:
ID AgeGroups PaperIDs
0 1 [3, 3, 10] [A, B, C]
3 4 [2, 6, 13, 12] [X, Z, T, D]
I'm not sure how to do it.
Upvotes: 1
Views: 67
Reputation: 504
Simple if else logic I wrote for each row using apply function, you can also use list comprehension for row.
data = {'ID':['1', '2', '3', '4'], 'AgeGroups':[[3,3,10],[2],[4,12],[2,6,13,12]],'PaperIDs':[['A','B','C'],['D'],['A','D'],['X','Z','T','D']]}
df = pd.DataFrame(data)
def extract_age(row):
my_list = row['AgeGroups']
count1 = 0
count2 = 0
if len(my_list)>=3:
for i in my_list:
if i<7:
count1 = count1 +1
elif i>8:
count2 = count2+1
if (count1 >= 2) and (count2 >=1):
print(row['AgeGroups'],row['PaperIDs'])
df.apply(lambda x: extract_age(x), axis =1)
Output
[3, 3, 10] ['A', 'B', 'C']
[2, 6, 13, 12] ['X', 'Z', 'T', 'D']
Upvotes: 2
Reputation: 863501
First create helper DataFrame
and compare by DataFrame.lt
and
DataFrame.gt
, then Series by Series.ge
and chain masks by &
for bitwise AND:
import ast
#if not lists
#df['AgeGroups'] = df['AgeGroups'].apply(ast.literal_eval)
df1 = pd.DataFrame(df['AgeGroups'].tolist())
df = df[df1.lt(7).sum(axis=1).ge(2) & df1.gt(8).sum(axis=1).ge(1)]
print (df)
ID AgeGroups PaperIDs
0 1 [3, 3, 10] [A, B, C]
3 4 [2, 6, 13, 12] [X, Z, T, D]
Or use list comprehension
with compare numpy arrays, counts by sum
and compare both counts chained by and
, because scalars:
m = [(np.array(x) < 7).sum() >= 2 and (np.array(x) > 8).sum() >=1 for x in df['AgeGroups']]
df = df[m]
print (df)
ID AgeGroups PaperIDs
0 1 [3, 3, 10] [A, B, C]
3 4 [2, 6, 13, 12] [X, Z, T, D]
Upvotes: 3