Wenying Liu
Wenying Liu

Reputation: 11

How to convert a list of mixed data type into a dataframe in Python

I have a list of mixed data type looking like this:

list = [['3D prototypes',
  'Can print large objects',
  'Autodesk Maya/Mudbox',
  '3D Studio'],
 ['We can produce ultra high resolution 3D prints in multiple materials.',
  'The quality of our prints beats MakerBot, Form 1, or any other either         
  powder based or printers using PLA, ABS, Wax or Resin. This printer has  
  the highest resolution and a very large build size. It prints fully  
  functional moving parts like a chain or an engine right out of the 
  printer.',
  'The printer is loaded with DurusWhite.',
  'Inquire to change the material. There is a $30 surcharge for material   
  switch.',
  "Also please mention your creation's dimensions in mm and if you need  
  expedite delivery.",
  "Printer's Net build size:",
  '294 x 192 x 148.6 mm (11.57 x 7.55 x 5.85 in.)',
  'The Objet30 features four Rigid Opaque materials and one material that  
  mimics polypropylene. The Vero family of materials all feature dimensional  
  stability and high-detail visualization, and are designed to simulate 
  plastics that closely resemble the end product.',
  'PolyJet based printers have a different way of working. These  
  technologies deliver the highest quality and precision unmatched by the 
  competition. These type of printers are ideal for professionals, for uses 
  ranging from casting jewelry to device prototyping.',
  'Rigid opaque white (VeroWhitePlus)',
  'Rigid opaque black (VeroBlackPlus )',
  'Rigid opaque blue (VeroBlue)',
  'Rigid opaque gray (VeroGray)',
  'Polypropylene-like material (DurusWhite) for snap fit applications'],
  'Hub can print invoices',
  'postal service',
  'Mar 2015',
  'Within the hour i',
  [u'40.7134', u'-74.0069'],
  '4',
  ['Customer JAMES reviewed Sun, 2015-04-19 05:17:        Awesome print!  
  Good quality, relatively fast shipping, and very responsive to my  
  questions; would certainly recommend this hub. ',
  'Hub XSENIO replied 2 days 16 hours ago:        Thanks James! ',
  'Customer Sara reviewed Sun, 2015-04-19 00:10:        Thank you for going  
  out of your way to get this to us in time for our shoot.  ',
 'Hub XSENIO replied 2 days 16 hours ago:        Thanks ! ',
 'Customer Aaron reviewed Sat, 2015-04-18 02:36:        Great service ',
 'Hub XSENIO replied 2 days 16 hours ago:        Thanks! ',
 "Customer Arnoldas reviewed Mon, 2015-03-23 19:47:        Xsenio's Hub was  
 able to produce an excellent quality print , was quick and reliable. 
 Awesome printing experience!  "]]

It has a mixed data type looking like this,

 <type 'list'>
 <type 'list'>
 <type 'str'>
 <type 'str'>
 <type 'str'>
 <type 'str'>
 <type 'list'>
 <type 'str'>
 <type 'list'>

But when I use

 pd.DataFrame(list)

It shows that,

 TypeError: Expected list, got str

Can anyone tell me what's wrong with that? Do I have to convert all items in list from string to list?

Thanks

Upvotes: 1

Views: 1833

Answers (1)

EvenLisle
EvenLisle

Reputation: 4812

It seems you should convert your list into a numpy array or a dict:

from pandas import DataFrame
import numpy
lst = numpy.array([['3D prototypes',
        'Can print large objects',
        'Autodesk Maya/Mudbox',
        '3D Studio'],
       ['We can produce ultra high resolution 3D prints in multiple materials.',
        '''The quality of our prints beats MakerBot, Form 1, or any other either
        powder based or printers using PLA, ABS, Wax or Resin. This printer has
        the highest resolution and a very large build size. It prints fully
        functional moving parts like a chain or an engine right out of the
        printer.''',
        'The printer is loaded with DurusWhite.',
        '''Inquire to change the material. There is a $30 surcharge for material
        switch.''',
        '''Also please mention your creation's dimensions in mm and if you need
        expedite delivery.''',
        "Printer's Net build size:",
        '294 x 192 x 148.6 mm (11.57 x 7.55 x 5.85 in.)',
        '''The Objet30 features four Rigid Opaque materials and one material that
        mimics polypropylene. The Vero family of materials all feature dimensional
        stability and high-detail visualization, and are designed to simulate
        plastics that closely resemble the end product.''',
        '''PolyJet based printers have a different way of working. These
        technologies deliver the highest quality and precision unmatched by the
        competition. These type of printers are ideal for professionals, for uses
        ranging from casting jewelry to device prototyping.''',
        'Rigid opaque white (VeroWhitePlus)',
        'Rigid opaque black (VeroBlackPlus )',
        'Rigid opaque blue (VeroBlue)',
        'Rigid opaque gray (VeroGray)',
        'Polypropylene-like material (DurusWhite) for snap fit applications'],
       'Hub can print invoices',
       'postal service',
       'Mar 2015',
       'Within the hour i',
       [u'40.7134', u'-74.0069'],
       '4',
       ['''Customer JAMES reviewed Sun, 2015-04-19 05:17:        Awesome print!
       Good quality, relatively fast shipping, and very responsive to my
       questions; would certainly recommend this hub. ''',
        'Hub XSENIO replied 2 days 16 hours ago:        Thanks James! ',
        '''Customer Sara reviewed Sun, 2015-04-19 00:10:        Thank you for going
        out of your way to get this to us in time for our shoot.  ''',
        'Hub XSENIO replied 2 days 16 hours ago:        Thanks ! ',
        'Customer Aaron reviewed Sat, 2015-04-18 02:36:        Great service ',
        'Hub XSENIO replied 2 days 16 hours ago:        Thanks! ',
        '''Customer Arnoldas reviewed Mon, 2015-03-23 19:47:        Xsenio's Hub was
        able to produce an excellent quality print , was quick and reliable.
        Awesome printing experience!  ''']])

df = DataFrame(lst)
print df

The above prints

                                                   0
0  [3D prototypes, Can print large objects, Autod...
1  [We can produce ultra high resolution 3D print...
2                             Hub can print invoices
3                                     postal service
4                                           Mar 2015
5                                  Within the hour i
6                                [40.7134, -74.0069]
7                                                  4
8  [Customer JAMES reviewed Sun, 2015-04-19 05:17...

[9 rows x 1 columns]

The doc does state the data parameter should be a numpy array or dict: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.html

PS: I also took the liberty of enclosing the multiline strings in triple quotes

Upvotes: 1

Related Questions