Python - grouping with conditions

Question

I'm looking for an elegant approach to the following problem.

Working from a DataFrame with 15 columns and 1250 rows of chemical compound information (1250 compounds). One particular column named "molecular_mass" has numbers which I would like to use as a handle to create groups containing 100 compounds each, where the "molecular_mass" numbers of each compound in the group can't be within +/- 1 of any other number in that group.

I'm performing the following to get randomized groups of 100, but this doesn't help me with my problem of keeping the "molecular_mass" numbers +/- 1 apart from any other number in the group.

import pandas as pd
    df=pd.read_csv('data.csv')
    df=df.sample(frac=1).reset_index(drop=TRUE)
    SIZE=100
    df['group']=df.index // SIZE
    groups=[
    df[df['group'] == num]
    for num in range (df['group'].max()+1)]

Adding a few example lines from data.csv

Compound	molecular_mass	Plate	Column	Row	Solubility
AAA	74.12	1	1	A	100/0
BBB	74.12	3	4	D	100/0
CCC	76.12	2	5	H	80/20
DDD	120.3	6	10	F	50/50
EEE	121.3	1	1	B	100/0
FFF	119.3	1	1	C	100/0
GGG	150.3	5	13	D	100/0

The data.csv is in the format (6 most important columns shown).

Python - grouping with conditions

Answers (1)

Related Questions