Jonah Kornberg
Jonah Kornberg

Reputation: 183

How to convert column of strings into sequential numbers in Pandas?

I have a column in a df that is full of strings like like ["1120", "2230", "1120", "1234" ...] where not every value in this column is unique.

I want to convert these strings into SEQUENTIAL NUMBERS from 0 to N where N is the number of unique values in that column so that I can make a scatterplot with this data. Simply changing the type of the column is not sufficient for this task. Any guidance is much appreciated. I tried using dummy variables, but don't really know where to start.

Upvotes: 0

Views: 1002

Answers (2)

Jonas
Jonas

Reputation: 1573

Just use .unique() and .reset_index() to get a lookup table from the strings to sequential ID and then .join() the tables:

df = pd.DataFrame(["1120", "2230", "1120", "1234"], columns=["num"])
sequential = pd.Series(df["num"].unique()).reset_index().rename(columns={0: "num"})
df.merge(sequential, on="num")

EDIT:

If you want to sort the number strings by their integer values first, you can add this line of code to sort (before you create the sequential Series):

df["num"] = df["num"].astype("int").sort_values().reset_index(drop=True)

Upvotes: 3

Patrick Artner
Patrick Artner

Reputation: 51643

You can hack this - I am not good enough to give you a pure pandas implementation, but this might work:

import pandas as pd

# prepare demo df
df = pd.DataFrame({"nums" : ["1120", "2230", "1120", "1234","9999","9"]}) 

# map(int, ...) needed for numeric sorting instead of lexicographical sorting
s = {str(v):i for i,v in enumerate(sorted(set(map(int, df["nums"]))))}

df["ReIndex"] = [s[key] for key in df["nums"]] 
print(df) 

Output:

   nums  ReIndex
0  1120        1
1  2230        3
2  1120        1
3  1234        2
4  9999        4
5     9        0

You can del df["nums"] and df.rename(columns = {'ReIndex':'nums'}, inplace = True) to get the numbers to be what you need them to be.

Upvotes: 0

Related Questions