Reputation: 3
I am running into a problem with trying to index my dataframe. As shown in the attached picture, I have a column in the dataframe called 'Identifiers' that contains a lot of redundant information ({'print_isbn_canonical': '). I only want the ISBN that comes after.
#Option 1 I tried
testdf2 = testdf2[testdf2['identifiers'].str[26:39]]
#Option 2 I tried
testdf2['identifiers_test'] = testdf2['identifiers'].str.replace("{'print_isbn_canonical': '","")
Unfortunately both of these options turn the dataframe column into a colum only containing NaN values
Please help out! I cannot seem to find the solution and have tried several things. Thank you all in advance!
Example image of the dataframe
Upvotes: 0
Views: 508
Reputation: 23217
If the contents of your column identifiers
is a real dict / json type, you can use the string accessor str[]
to access the dict value by key, as follows:
testdf2['identifiers_test'] = testdf2['identifiers'].str['print_isbn_canonical']
Demo
data = {'identifiers': [{'print_isbn_canonical': '9780721682167', 'eis': '1234'}]}
df = pd.DataFrame(data)
df['isbn'] = df['identifiers'].str['print_isbn_canonical']
print(df)
identifiers isbn
0 {'print_isbn_canonical': '9780721682167', 'eis': '1234'} 9780721682167
Upvotes: 1
Reputation: 433
Try this out :
testdf2['new_column'] = testdf2.apply(lambda r : r.identifiers[26:39],axis=1)
Here I assume that the identifiers column is string type
Upvotes: 0