Reputation: 63
I'm trying to write a program that will sort a dataframe based on a given range and return a subsets of the original dataframe. The range can be variable for the example I just 3. And the issue I'm having is when I try to split up dataframe the first range is the only one that is printed.
I have tried & statement, I am currently using .between, I have tried to add if statements and break and continue but none have achieved what I am after .between has come closest
data = { 'a':[3.0, 5.0, 7.0, 2.0], 'b':[1, 3, 5, 3], 'c':[2, 4, 6, 8]}
range = [(0,2), (3,5), (6,8)]
def sort_a(range, data):
for item in (range):
low, high = item
data = data[data['a'].between(low, high)]
print(data)
Expected
a b c
0 2.0 3 8
a b c
0 3.0 1 2
1 5.0 3 4
a b c
0 7.0 5 6
Actual
a b c
0 2.0 3 8
Empty DataFrame
Columns: [a, b, c]
Index: []
Empty DataFrame
Columns: [a, b, c]
Index: []
Upvotes: 0
Views: 69
Reputation: 2759
If you just do:
import pandas as pd
data = { 'a':[3.0, 5.0, 7.0, 2.0], 'b':[1, 3, 5, 3], 'c':[2, 4, 6, 8]}
r = [(0,2), (3,5), (6,8)]
df =pd.DataFrame.from_dict(data)
for rr in r:
data1 = df[df['a'].between(*rr)]
print(data1)
you get your expected output:
a b c
3 2.0 3 8
a b c
0 3.0 1 2
1 5.0 3 4
a b c
2 7.0 5 6
This is answer is similar to the one already given, your issue is that you are rewriting the data frame when you do data = data[data['a'].between(low, high)]
. However, I changed the name of your range
variable to r. Do not name variables the same name as built in functions (range
is a built in function). You can also just use the asterisk with item
in the .between
function, you don't need to assign variable names and then put them in.
you can store them in a dict too , just to refer to them later on:
d={f'df_{e}': df[df['a'].between(*rr)] for e,rr in enumerate(r)}
print(d['df_1'])
a b c
0 3.0 1 2
1 5.0 3 4
Upvotes: 2
Reputation: 30930
You are rewriting data, try this:
data2 = data[data['a'].between(low, high)]
print(data2)
Upvotes: 2