Frank.lee
Frank.lee

Reputation: 31

list reshape as similar to dictionary type

I'm dealing with patent data with pandas and numpy. the steps that I've done and data I've got from the raw data is below.

code

title = df['title'].tolist()
cpc = df_cpu['cpc'].tolist() 
z = zip(title, cpc)

result

 ('(real-time information transmission system)',
  'A61B-0005/0002, A61B-0005/0001, A61B-0005/0021'),
 ('(skincare counselling system)',
  'G06Q-0050/0010'),
 ('(apparatus for monitoring posture)',
  'A61B-0005/1116, A61B-0005/0002'),,....
)

It's a basically list(or tuple) with 'titles of patent' and it's own 'cpc codes' defining where sub technology the patents belongs to . In this case, I'd like to split(or should I say reshape) the data I've got as I wrote below. I guess it is not just split the data but reshape with specific rules.

('(real-time information transmission system)',
  'A61B-0005/0002'),
 '(real-time information transmission system)',
  'A61B-0005/0001')
 '(real-time information transmission system)',
  'A61B-0005/0021')
 ('(skincare counselling system)',
  'G06Q-0050/0010'),
 ('(apparatus for monitoring posture)',
  'A61B-0005/1116')
 ('(apparatus for monitoring posture)',
  'A61B-0005/0002'),,....
)

I thought about counting each commas and copy titles by the number of commas but I guess there should be more easy way to do it and I don't even know how to do with the way I thought.

Upvotes: 0

Views: 71

Answers (1)

sglvladi
sglvladi

Reputation: 86

If I understood the end goal correctly, you want to use split() to split the cpc codes string, using ',' as the separator. This will generate a list, which you can then iterate through to create a new list/tuple.

Here is a snippet that I think accomplishes what you want:

from pprint import pprint

z = (('(real-time information transmission system)', 'A61B-0005/0002, A61B-0005/0001, A61B-0005/0021'),
     ('(skincare counselling system)', 'G06Q-0050/0010'),
     ('(apparatus for monitoring posture)', 'A61B-0005/1116, A61B-0005/0002'))

new_z = []
for title, cpc_codes_str in z:
     cpc_codes = cpc_codes_str.split(',')
     for code in cpc_codes:
          new_z.append((title, code))

pprint(tuple(new_z))

and this is what is printed:

(('(real-time information transmission system)', 'A61B-0005/0002'),
 ('(real-time information transmission system)', ' A61B-0005/0001'),
 ('(real-time information transmission system)', ' A61B-0005/0021'),
 ('(skincare counselling system)', 'G06Q-0050/0010'),
 ('(apparatus for monitoring posture)', 'A61B-0005/1116'),
 ('(apparatus for monitoring posture)', ' A61B-0005/0002'))

Hope this helps.

Upvotes: 1

Related Questions