Reputation: 858
I have a sorted list, for example,
S = [0, 10.2, 345.9, ...]
If S is large (500k+ elements), what is the best way to find what percentile each element belongs to?
My goal is to store in database table that looks something like this:
Svalue | Percentile
-------------------
0 | a
10.2. | b
345.9 | c
... | ...
Upvotes: 0
Views: 275
Reputation: 5788
Solution:
# Import and initialise pandas into session:
import pandas as pd
# Store a scalar of the length of the list: list_length => list
list_length = len(S)
# Use a list comprehension to retrieve the indices of each element: idx => list
idx = [index for index, value in enumerate(S)]
# Divide each of the indices by the list_length scalar using a list
# comprehension: percentile_rank => list
percentile_rank = [el / list_length for el in idx]
# Column bind separate lists into a single DataFrame in order to achieved desired format: df => pd.DataFrame
df = pd.DataFrame({"Svalue": S, "Percentile": percentile_rank})
# Send the first 6 rows to console: stdout
df.head()
Data:
# Ensure list is sorted: S => list
S = sorted([0, 10.2, 345.9])
# Print the result: stdout
print(S)
Upvotes: 1