Reputation: 3396
I have a messy dataframe where I am trying to "flag" the rows that contain a certain number in the ids
column. The values in this column represent an inclusive range: for example, "row 4" contains the following numbers:
2409,2410,2411,2412,2413,2414,2377,2378,1478,1479,1480,1481,1482,1483,1484
And in "row 0" and "row 1" the range for one of the sets is backwards (1931,1930,1929)
If I want to know which rows have sets that contain "2340" and "1930" for example, how would I do this? I think a loop is needed, sometimes will need to query more than just two numbers. Using Python 3.8.
Example Dataframe
x = ['1331:1332,1552:1551,1931:1928,1965:1973,1831:1811,1927:1920',
'1331:1332,1552:1551,1931:1929,180:178,1966:1973,1831:1811,1927:1920',
'2340:2341,1142:1143,1594:1593,1597:1596,1310,1311',
'2339:2341,1142:1143,1594:1593,1597:1596,1310:1318,1977:1974',
'2409:2414,2377:2378,1478:1484',
'2474:2476',
]
y = [6.48,7.02,7.02,6.55,5.99,6.39,]
df = pd.DataFrame(list(zip(x, y)), columns =['ids', 'val'])
display(df)
Upvotes: 0
Views: 434
Reputation: 1420
I would write a function that perform 2 steps:
ids_num_list
query_id
is in the ids_num_list
def check_num_in_ids_string(ids_string, query_id):
# Convert ids_string to ids_num_list
ids_range_list = ids_string.split(',')
ids_num_list = set()
for ids_range in ids_range_list:
if ':' in ids_range:
lower, upper = sorted(ids_range.split(":"))
num_list = list(range(int(lower), int(upper)+ 1))
ids_num_list.update(num_list)
else:
ids_num_list.add(int(ids_range))
# Check if query number is in the list
if int(query_id) in ids_num_list:
return 1
else:
return 0
# Example usage
query_id_list = ['2340', '1930']
for query_id in query_id_list:
df[f'n{query_id}'] = (
df['ids']
.apply(lambda x : check_num_in_ids_string(x, query_id))
)
which returns you what you require:
ids val n2340 n1930
0 1331:1332,1552:1551,1931:1928,1965:1973,1831:1... 6.48 0 1
1 1331:1332,1552:1551,1931:1929,180:178,1966:197... 7.02 0 1
2 2340:2341,1142:1143,1594:1593,1597:1596,1310,1311 7.02 1 0
3 2339:2341,1142:1143,1594:1593,1597:1596,1310:1... 6.55 1 0
4 2409:2414,2377:2378,1478:1484 5.99 0 0
5 2474:2476 6.39 0 0
Upvotes: 2