a11
a11

Reputation: 3396

Python and Pandas, find rows that contain value, target column has many sets of ranges

I have a messy dataframe where I am trying to "flag" the rows that contain a certain number in the ids column. The values in this column represent an inclusive range: for example, "row 4" contains the following numbers: 2409,2410,2411,2412,2413,2414,2377,2378,1478,1479,1480,1481,1482,1483,1484
And in "row 0" and "row 1" the range for one of the sets is backwards (1931,1930,1929)

If I want to know which rows have sets that contain "2340" and "1930" for example, how would I do this? I think a loop is needed, sometimes will need to query more than just two numbers. Using Python 3.8.

Example Dataframe

x = ['1331:1332,1552:1551,1931:1928,1965:1973,1831:1811,1927:1920',
     '1331:1332,1552:1551,1931:1929,180:178,1966:1973,1831:1811,1927:1920',
     '2340:2341,1142:1143,1594:1593,1597:1596,1310,1311',
     '2339:2341,1142:1143,1594:1593,1597:1596,1310:1318,1977:1974',
     '2409:2414,2377:2378,1478:1484',
     '2474:2476',
]
y = [6.48,7.02,7.02,6.55,5.99,6.39,]
df = pd.DataFrame(list(zip(x, y)), columns =['ids', 'val']) 
display(df)

enter image description here

Desired Output Dataframe
enter image description here

Upvotes: 0

Views: 434

Answers (1)

Toukenize
Toukenize

Reputation: 1420

I would write a function that perform 2 steps:

  1. Given the ids_string that contains the range of ids, list all the ids as ids_num_list
  2. Check if the query_id is in the ids_num_list
def check_num_in_ids_string(ids_string, query_id):
    
    # Convert ids_string to ids_num_list
    ids_range_list = ids_string.split(',')
    ids_num_list = set()
    
    for ids_range in ids_range_list:
        
        if ':' in ids_range:
            lower, upper = sorted(ids_range.split(":"))
            num_list = list(range(int(lower), int(upper)+ 1))
            ids_num_list.update(num_list)
            
        else:
            ids_num_list.add(int(ids_range))
            
    # Check if query number is in the list
    
    if int(query_id) in ids_num_list:
        return 1
    
    else:
        return 0

# Example usage
query_id_list = ['2340', '1930']

for query_id in query_id_list:
    df[f'n{query_id}'] = (
        df['ids']
        .apply(lambda x : check_num_in_ids_string(x, query_id))
    )

which returns you what you require:

    ids                                                 val     n2340   n1930
0   1331:1332,1552:1551,1931:1928,1965:1973,1831:1...   6.48    0       1
1   1331:1332,1552:1551,1931:1929,180:178,1966:197...   7.02    0       1
2   2340:2341,1142:1143,1594:1593,1597:1596,1310,1311   7.02    1       0
3   2339:2341,1142:1143,1594:1593,1597:1596,1310:1...   6.55    1       0
4   2409:2414,2377:2378,1478:1484                       5.99    0       0
5   2474:2476                                           6.39    0       0

Upvotes: 2

Related Questions