pesc9
pesc9

Reputation: 187

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas (Python)

I'm manipulating a JSON file from which I have run this code to get the following dataframe:

import pandas as pd

topics = df.set_index('username').popular_board_data.str.extractall(r'name":"([^,]*)')
total = df.set_index('username').popular_board_data.str.extractall(r'totalCount\":([^,}]*)')

data = []
for username in df.username.unique():
for topic in zip(topics[0][username], total[0][username]):
    data.append([username, topic])

df_topic = pd.DataFrame(data, columns='username,topic'.split(','))

    username        topic
0     lukl    (Hardware", 80)
1     lukl    (Marketplace", 31)
2     lukl    (Atari 5200", 27)
3     lukl    (Atari 8-Bit Computers", 9)
4     lukl    (Modern Gaming", 3)

Now I need to split the information from the column 'topic' into two different columns:

This is the expected result:

    username        topic          _topic       _total
0     lukl    (Hardware", 80)      Hardware     80
1     lukl    (Marketplace", 31)   Marketplace  31
2     lukl    (Atari 5200", 27)    Atari 5200   27
3     lukl    (Atari 8", 9)        Atari 8      9
4     lukl    (Modern", 3)         Modern       3

I though about doing it with this code:

df_top = df_topic.copy()
df_top['_topic'] = df_topic['topic'].str.split('(').str[1].str.split('",').str[0]
df_top['_total'] = df_topic['topic'].str.split('",').str[1].str.split(')').str[0]
df_top

But I'm getting this error:

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

Upvotes: 4

Views: 1605

Answers (3)

heena bawa
heena bawa

Reputation: 828

You can use regular expressions like:

df['_topic'] = df['topic'].str.extract(r'([a-zA-Z]+)')
df['_total'] = df['topic'].str.extract(r'(\d+)')

  username                        topic       _topic _total
0     lukl              (Hardware", 80)     Hardware     80
1     lukl           (Marketplace", 31)  Marketplace     31
2     lukl            (Atari 5200", 27)        Atari   5200
3     lukl  (Atari 8-Bit Computers", 9)        Atari      8
4     lukl          (Modern Gaming", 3)       Modern      3

Upvotes: 0

jezrael
jezrael

Reputation: 862481

I think there are tuples, so use DataFrame constructor only:

df_topic[['_topic', '_total']]=pd.DataFrame(df_topic['topic'].values.tolist(), 
                                index=df_topic.index)

Better solution with your previous answer data with concat and DataFrame.reset_index:

df = [{"username": "last",
    "popular_board_data": "{\"boards\":[{\"postCount\":\"75\",\"topicCount\":\"5\",\"name\":\"Hardware\",\"url\",\"totalCount\":80},{\"postCount\":\"20\",\"topicCount\":\"11\",\"name\":\"Marketplace\",\"url\",\"totalCount\":31},{\"postCount\":\"26\",\"topicCount\":\"1\",\"name\":\"Atari 5200\",\"url\",\"totalCount\":27},{\"postCount\":\"9\",\"topicCount\":0,\"name\":\"Atari 8\",\"url\"\"totalCount\":9}"
    },
    {"username": "truk",
     "popular_board_data": "{\"boards\":[{\"postCount\":\"351\",\"topicCount\":\"11\",\"name\":\"Atari 2600\",\"url\",\"totalCount\":362},{\"postCount\":\"333\",\"topicCount\":\"22\",\"name\":\"Hardware\",\"url\",\"totalCount\":355},{\"postCount\":\"194\",\"topicCount\":\"8\",\"name\":\"Marketplace\",\"url\",\"totalCount\":202}"
    }]
df = pd.DataFrame(df)

#added " for remove it from output
topics = df.set_index('username').popular_board_data.str.extractall(r'name":"([^,]*)"')
total = df.set_index('username').popular_board_data.str.extractall(r'totalCount\":([^,}]*)')

df1 = pd.concat([topics[0], total[0]], axis=1, keys=['_topic', '_total'])
df1 = df1.reset_index(level=1, drop=True).reset_index()
print (df1)
  username       _topic _total
0     last     Hardware     80
1     last  Marketplace     31
2     last   Atari 5200     27
3     last      Atari 8      9
4     truk   Atari 2600    362
5     truk     Hardware    355
6     truk  Marketplace    202

Upvotes: 2

tawab_shakeel
tawab_shakeel

Reputation: 3739

I am taking topic as string if not string then convert it in string

df = pd.DataFrame(data={"username":['luk1','luk1','luk1'],
                  'topic':[ '(Hardware, 80)','(Marketplace, 31)', '(Atari 5200, 27)']})
df['_topic'] = df['topic'].apply(lambda x:str(x).split(",")[0][1:])
df['_total'] = df['topic'].apply(lambda x:str(x).split(",")[1][:-1])

   username         topic      _topic   _total
0   luk1    (Hardware, 80)      Hardware    80
1   luk1    (Marketplace, 31)   Marketplace 31
2   luk1    (Atari 5200, 27)    Atari 5200  27

Upvotes: 0

Related Questions