pepsi_max2k
pepsi_max2k

Reputation: 15

Finding row in Dataframe when dataframe is both int or string?

minor problem doing my head in. I have a dataframe similar to the following:

Number      Title
12345678    A
34567890-S  B
11111111    C
22222222-L  D

This is read from an excel file using pandas in python, then the index set to the first column:

db = db.set_index(['Number'])

I then lookup Title based on Number:

lookup = "12345678"
title = str(db.loc[lookup, 'Title'])

However... Whilst anything postfixed with "-Something" works, anything without it doesn't find a location (eg. 12345678 will not find anything, 34567890-S will). My only hunch is it's to do with looking up as either strings or ints, but I've tried a few things (converting the table to all strings, changing loc to iloc,ix,etc) but so far no luck.

Any ideas? Thanks :)

UPDATE: So trying this from scratch doesn't exhibit the same behaviour (creating a test db presumably just sets everything as strings), however importing from CSV is resulting in the above, and...

Searching "12345678" (as a string) doesn't find it, but 12345678 as an int will. Likewise the opposite for the others. So the dataframe is only matching the pure numbers in the index with ints, but anything else with strings.

Also, I can't not search for the postfix, as I have multiple rows with differing postfix eg 34567890-S, 34567890-L, 34567890-X.

Upvotes: 0

Views: 893

Answers (2)

Graipher
Graipher

Reputation: 7186

If you want to cast all entries to one particular type, you can use pandas.Series.astype:

db["Number"] = df["Number"].astype(str)
db = db.set_index(['Number'])

lookup = "12345678"
title = db.loc[lookup, 'Title']

Interestingly this is actually slower than using pandas.Index.map:

x1 = [pd.Series(np.arange(n)) for n in np.logspace(1, 4, dtype=int)]
x2 = [pd.Index(np.arange(n)) for n in np.logspace(1, 4, dtype=int)]

def series_astype(x1):
    return x1.astype(str)

def index_map(x2):
    return x2.map(str)

enter image description here

Upvotes: 4

w-m
w-m

Reputation: 11232

Consider all the indeces as strings, as at least some of them are not numbers. If you want to lookup a specific item that possibly could have a postfix, you could match it by comparing the start of the strings with .str.startswith:

lookup = db.index.str.startswith("34567890")
title = db.loc[lookup, "Title"]

Upvotes: 0

Related Questions