Indexing column in Pandas Dataframe returns NaN

Question

I am running into a problem with trying to index my dataframe. As shown in the attached picture, I have a column in the dataframe called 'Identifiers' that contains a lot of redundant information ({'print_isbn_canonical': '). I only want the ISBN that comes after.

    #Option 1 I tried
    testdf2 = testdf2[testdf2['identifiers'].str[26:39]]
    
    #Option 2 I tried
    testdf2['identifiers_test'] = testdf2['identifiers'].str.replace("{'print_isbn_canonical': '","")

Unfortunately both of these options turn the dataframe column into a colum only containing NaN values

Please help out! I cannot seem to find the solution and have tried several things. Thank you all in advance!

Example image of the dataframe

SeaBean · Accepted Answer

If the contents of your column identifiers is a real dict / json type, you can use the string accessor str[] to access the dict value by key, as follows:

testdf2['identifiers_test'] = testdf2['identifiers'].str['print_isbn_canonical']

Demo

data = {'identifiers': [{'print_isbn_canonical': '9780721682167', 'eis': '1234'}]}
df = pd.DataFrame(data)

df['isbn'] = df['identifiers'].str['print_isbn_canonical']

print(df)

                                                identifiers           isbn
0  {'print_isbn_canonical': '9780721682167', 'eis': '1234'}  9780721682167

Indexing column in Pandas Dataframe returns NaN

Answers (2)

Related Questions