Reputation: 9
I want to count the number of elements in a row that fall within a range and then print a new column with the results. After looking around I came up with the following solution, however the results are not consistent. Is the solution too simplistic for what I want to accomplish? I have the following DataFrame.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,10,size=(10, 3)), columns=list('ABC'))
def val_1(row):
val_1 = 0
if row.loc['A'] and row.loc['B'] and row.loc["C"] in range(1,3):
val_1 = 3
elif row.loc['A'] and row.loc['B'] in range(1,3):
val_1 = 2
elif row.loc['A'] in range(1,3):
val_1 = 1
return val_1
def val_2(row):
val_2 = 0
if row.loc['A'] and row.loc['B'] and row.loc["C"] in range(3,6) :
val_2 = 3
elif row.loc['A'] and row.loc['B'] in range(3,6) :
val_2 = 2
elif row.loc['A'] in range(3,6) :
val_2 = 1
return val_2
def val_3(row):
val_3 = 0
if row.loc['A'] and row.loc['B'] and row.loc["C"] in range(6,10) :
val_3 = 3
elif row.loc['A'] and row.loc['B'] in range(6,10) :
val_3 = 2
elif row.loc['A']in range(6,10) :
val_3 = 1
return val_3
def results():
df['Val_1'] = df.apply(val_1, axis=1)
df['Val_2'] = df.apply(val_2, axis=1)
df['Val_3'] = df.apply(val_3, axis=1)
print(df)
results()
A B C Val_1 Val_2 Val_3
0 9 0 0 0 0 1
1 6 1 0 2 0 1
2 8 5 5 0 3 1
3 9 7 0 0 0 2
4 4 6 2 3 1 2
5 1 5 5 1 3 0
6 8 1 7 2 0 3
7 4 8 5 0 3 2
8 0 6 0 0 0 0
9 3 0 3 0 1 0
Thanks for your help.
Upvotes: 1
Views: 477
Reputation: 11321
Another approach which is using numpy.select
(recommended in the Pandas User Guide here):
import numpy as np
for n, (start, stop) in enumerate([(1, 3), (3, 6), (6, 10)], start=1):
m = df.isin(range(start, stop))
condlist = [m.all(axis=1), m[["A", "B"]].all(axis=1), m["A"]]
df[f"Val_{n}"] = np.select(condlist, [3, 2, 1])
m
on df
that checks if the df
values are in the resp. range(start, stop)
.m
build a condition list condlist
: The 1. entry checks if all values are in the range, the 2. checks if the values in columns A
and B
are in the range, and the 3. checks if the value in column A
is in the range - all checks are done row-wise.condlist
set the corresponding values from [3, 2, 1]
(called choicelist
in the numpy.select
docs) in the new column, and 0
(standard default
) if no condition is met.The selection follows the preferences in you code - see the numpy.select
documentation:
When multiple conditions are satisfied, the first one encountered in condlist is used.
Your question actually sounds a bit different to what you are doing:
I want to count the number of elements in a row that fall within a range and then print a new column with the results.
If that is your real goal, then you could try something simpler:
for n, (start, stop) in enumerate([(1, 3), (3, 6), (6, 10)], start=1):
df[f"Val_{n}"] = df.isin(range(start, stop)).sum(axis=1)
Why your approach fails (in addition to @Mazhar's explanation): This
if row.loc['A'] and row.loc['B'] and row.loc["C"] in range(1,3):
isn't how logical operators like and
work: You have to fully specify each part of the condition, like:
if (row.loc['A'] in range(1,3)) and (row.loc['B'] in range(1,3)) and (row.loc["C"] in range(1,3)):
The way you have used it actually resolves to
if bool(row.loc['A']) and bool(row.loc['B']) and row.loc["C"] in range(1,3):
which here means (since the values are numbers)
if (row.loc['A'] != 0) and (row.loc['B'] != 0) and (row.loc["C"] in range(1,3)):
Upvotes: 1
Reputation: 1064
This is shorter version of your code. The problem is in your code:
row.loc['A'] and row.loc['B'] and row.loc["C"] in range(start,end)
C
column (Not for A
& B
)def get_val(row,start,end):
val = 0
if row.loc['A'] and row.loc['B'] and row.loc["C"] in range(start,end) :
val = 3
elif row.loc['A'] and row.loc['B'] in range(start,end) :
val = 2
elif row.loc['A'] in range(start,end) :
val = 1
return val
df = pd.DataFrame(np.random.randint(0,10,size=(10, 3)), columns=list('ABC'))
df['Val_1'] = df.apply(lambda x:get_val(x,1,3), axis=1)
df['Val_2'] = df.apply(lambda x:get_val(x,3,6), axis=1)
df['Val_3'] = df.apply(lambda x:get_val(x,6,10), axis=1)
My Solution
def query_build(start, end):
query1 = f'A>={start} and A<={end} and B>={start} and B<={end} and C>={start} and C<={end}'
query2 = f'A>={start} and A<={end} and B>={start} and B<={end}'
query3 = f'A>={start} and A<={end}'
return {query1: 3,
query2: 2,
query3: 1}
df = pd.DataFrame(np.random.randint(0,10,size=(10, 3)), columns=list('ABC'))
df['val1'] = 0
df['val2'] = 0
df['val3'] = 0
val_range = {'val1':(1,2),'val2':(3,5),'val3':(6,9)}
for name, r_range in val_range.items():
query_set = query_build(*r_range)
for query, val in query_set.items():
length = len(df.query(query))
if length:
df[name][df.query(query).index] = val
print(df)
Upvotes: 1