TPack
TPack

Reputation: 63

How do I get this for loop to return the dataframe broken up based on the ranges?

I'm trying to write a program that will sort a dataframe based on a given range and return a subsets of the original dataframe. The range can be variable for the example I just 3. And the issue I'm having is when I try to split up dataframe the first range is the only one that is printed.

I have tried & statement, I am currently using .between, I have tried to add if statements and break and continue but none have achieved what I am after .between has come closest

data = { 'a':[3.0, 5.0, 7.0, 2.0], 'b':[1, 3, 5, 3], 'c':[2, 4, 6, 8]}
range = [(0,2), (3,5), (6,8)]

def sort_a(range, data):
    for item in (range):
        low, high = item
        data = data[data['a'].between(low, high)]
        print(data)

Expected

   a    b   c  
0  2.0  3   8


   a    b   c
0  3.0  1   2
1  5.0  3   4

   a    b   c
0  7.0  5   6

Actual

   a    b   c  
0  2.0  3   8
Empty DataFrame
Columns: [a, b, c]
Index: []
Empty DataFrame
Columns: [a, b, c]
Index: []

Upvotes: 0

Views: 69

Answers (2)

Anna Nevison
Anna Nevison

Reputation: 2759

If you just do:

import pandas as pd
data = { 'a':[3.0, 5.0, 7.0, 2.0], 'b':[1, 3, 5, 3], 'c':[2, 4, 6, 8]}
r = [(0,2), (3,5), (6,8)]

df =pd.DataFrame.from_dict(data)
for rr in r:
    data1 = df[df['a'].between(*rr)]
    print(data1)

you get your expected output:

     a  b  c
3  2.0  3  8
     a  b  c
0  3.0  1  2
1  5.0  3  4
     a  b  c
2  7.0  5  6

This is answer is similar to the one already given, your issue is that you are rewriting the data frame when you do data = data[data['a'].between(low, high)]. However, I changed the name of your range variable to r. Do not name variables the same name as built in functions (range is a built in function). You can also just use the asterisk with item in the .between function, you don't need to assign variable names and then put them in.

you can store them in a dict too , just to refer to them later on:

d={f'df_{e}': df[df['a'].between(*rr)] for e,rr in enumerate(r)}
print(d['df_1'])

     a  b  c
0  3.0  1  2
1  5.0  3  4

Upvotes: 2

ansev
ansev

Reputation: 30930

You are rewriting data, try this:

data2 = data[data['a'].between(low, high)] 
print(data2)

Upvotes: 2

Related Questions