How to convert column of strings into sequential numbers in Pandas?

Question

I have a column in a df that is full of strings like like ["1120", "2230", "1120", "1234" ...] where not every value in this column is unique.

I want to convert these strings into SEQUENTIAL NUMBERS from 0 to N where N is the number of unique values in that column so that I can make a scatterplot with this data. Simply changing the type of the column is not sufficient for this task. Any guidance is much appreciated. I tried using dummy variables, but don't really know where to start.

Jonas · Accepted Answer

Just use .unique() and .reset_index() to get a lookup table from the strings to sequential ID and then .join() the tables:

df = pd.DataFrame(["1120", "2230", "1120", "1234"], columns=["num"])
sequential = pd.Series(df["num"].unique()).reset_index().rename(columns={0: "num"})
df.merge(sequential, on="num")

EDIT:

If you want to sort the number strings by their integer values first, you can add this line of code to sort (before you create the sequential Series):

df["num"] = df["num"].astype("int").sort_values().reset_index(drop=True)

How to convert column of strings into sequential numbers in Pandas?

Answers (2)

Related Questions