Reputation: 91
Im searching for a function that Returns the Position of an element in a dataframe. - there is duplicates in the dataframe amongst the values - dataframe About 10*2000 - the function will be applied on a dataframe using applymap()
# initial dataframe
df = pandas.DataFrame({"R1": [8,2,3], "R2": [2,3,4], "R3": [-3,4,-1]})
Example:
get_position(2) is not clear as it could be either "R1" or "R2". I am wondering if there is another way that python knows which Position the element holds - possibly during the applymap() Operation
Edit:
df.rank(axis=1,pct=True)
EDIT2:
#intial dataframe
df_initial = pandas.DataFrame({"R1": [8,2,3], "R2": [2,3,4], "R3": [-3,4,-1]})
step1)
df_rank = df_initial.rank(axis=1,pct=True)
step2)
# Building Groups based on the percentage of the respective value
def function103(x):
if 0.0 <= x <= 0.1:
P1.append(get_column_name1(x))
return x
elif 0.1 < x <= 0.2:
P2.append(get_column_name1(x))
return x
elif 0.2 < x <= 0.3:
P3.append(get_column_name1(x))
return x
elif 0.3 < x <= 0.4:
P4.append(get_column_name1(x))
return x
elif 0.4 < x <= 0.5:
P5.append(get_column_name1(x))
return x
elif 0.5 < x <= 0.6:
P6.append(get_column_name1(x))
return x
elif 0.6 < x <= 0.7:
P7.append(get_column_name1(x))
return x
elif 0.7 < x <= 0.8:
P8.append(get_column_name1(x))
return x
elif 0.8 < x <= 0.9:
P9.append(get_column_name1(x))
return x
elif 0.9 < x <= 1.0:
P10.append(get_column_name1(x))
return x
else:
return x
step3)
# trying to get the columns Name of the the respective value
# my idea was to determine the Position of each value to then write a function
def get_column_name1(x)
#to return the values column Name
step 4)
# apply the function
P1=[]
P2=[]
P3=[]
P4=[]
P5=[]
P6=[]
P7=[]
P8=[]
P9=[]
P10=[]
P11=[]
df_rank.applymap(function103).head()
Upvotes: 4
Views: 4976
Reputation: 863226
If need index or columns names by value in DataFrame use numpy.where
for positions and then select all index or columns values converted to numpy array:
df = pd.DataFrame({"R1": [8,2,3], "R2": [2,3,4], "R3": [-3,4,-1]})
i, c = np.where(df == 2)
print (i, c)
[0 1] [1 0]
print (df.index.values[i])
[0 1]
print (df.columns.values[c])
['R2' 'R1']
EDIT:
i, c = np.where(df == 2)
df1 = df.rank(axis=1,pct=True)
print (df1)
R1 R2 R3
0 1.000000 0.666667 0.333333
1 0.333333 0.666667 1.000000
2 0.666667 1.000000 0.333333
print (df1.iloc[i, c])
R2 R1
0 0.666667 1.000000
1 0.666667 0.333333
print (df1.where(df == 2).dropna(how='all').dropna(how='all', axis=1))
R1 R2
0 NaN 0.666667
1 0.333333 NaN
Or:
out = df1.stack()[df.stack() == 2].rename_axis(('idx','cols')).reset_index(name='val')
print (out)
idx cols val
0 0 R2 0.666667
1 1 R1 0.333333
EDIT:
Solution for your function - need iterate by one column DataFrame created by reshape and extract Series.name, what is same like column name:
def get_column_name1(x):
return x.name
P1=[]
P2=[]
P3=[]
P4=[]
P5=[]
P6=[]
P7=[]
P8=[]
P9=[]
P10=[]
P11=[]
def function103(x):
if 0.0 <= x[0] <= 0.1:
P1.append(get_column_name1(x))
return x
elif 0.1 < x[0] <= 0.2:
P2.append(get_column_name1(x))
return x
elif 0.2 < x[0] <= 0.3:
P3.append(get_column_name1(x))
return x
elif 0.3 < x[0] <= 0.4:
P4.append(get_column_name1(x))
return x
elif 0.4 < x[0] <= 0.5:
P5.append(get_column_name1(x))
return x
elif 0.5 < x[0] <= 0.6:
P6.append(get_column_name1(x))
return x
elif 0.6 < x[0] <= 0.7:
P7.append(get_column_name1(x))
return x
elif 0.7 < x[0] <= 0.8:
P8.append(get_column_name1(x))
return x
elif 0.8 < x[0] <= 0.9:
P9.append(get_column_name1(x))
return x
elif 0.9 < x[0] <= 1.0:
P10.append(get_column_name1(x))
return x
else:
return x
a = df_rank.stack().reset_index(level=0, drop=True).to_frame().apply(function103, axis=1)
print (P4)
['R3', 'R1', 'R3']
Upvotes: 3