Easy way to convert list of string to numpy array

I am working with data from World Ocean Database (WOD), and somehow I ended up with a list that looks like this one:

     idata = 
     ['         1,       0.0,0, ,    6.2386,0, ,   33.2166,0, ,\n',
      '         2,       5.0,0, ,    6.2385,0, ,   33.2166,0, ,\n',
      '         3,      10.0,0, ,    6.2306,0, ,   33.2175,0, ,\n',
      '         4,      15.0,0, ,    6.2359,0, ,   33.2176,0, ,\n',
      '         5,      20.0,0, ,    6.2387,0, ,   33.2175,0, ,\n']

Is there any easy way to convert this structure into a numpy array or in a friendlier format? I just want to add the information of the columns in a pandas DataFrame.

Upvotes: 1

Views: 98

Answers (3)

ldz
ldz

Reputation: 2215

You might split the values by comma, strip the parts and add the resulting array to a DataFrame like follows:

import pandas as pd

data = [[item.strip() for item in line.split(',')] for line in idata]
df = pd.DataFrame(data)

In order to safely convert the DataFrame to numeric values pd.to_numeric could be used:

df = df.apply(pd.to_numeric)

Upvotes: 1

jez
jez

Reputation: 15349

try: from io import StringIO  # Python 3
except: from StringIO import StringIO  # Python 2

import pandas as pd

df = pd.read_csv(StringIO(''.join(idata)), index_col=0, header=None, sep=r',\s*', engine='python')

print(df)

# prints:
#       1   2   3       4   5   6        7   8   9  10
# 0                                                   
# 1   0.0   0 NaN  6.2386   0 NaN  33.2166   0 NaN NaN
# 2   5.0   0 NaN  6.2385   0 NaN  33.2166   0 NaN NaN
# 3  10.0   0 NaN  6.2306   0 NaN  33.2175   0 NaN NaN
# 4  15.0   0 NaN  6.2359   0 NaN  33.2176   0 NaN NaN
# 5  20.0   0 NaN  6.2387   0 NaN  33.2175   0 NaN NaN

Remove the header=None if you can include an initial row of idata that actually specifies helpful column labels. Remove sep=r',\s*', engine='python' if you're happy for the blank columns to contain blank string objects instead of NaN.

Upvotes: 0

norok2
norok2

Reputation: 26896

You could use a combination of string manipulation (i.e. strip() and split()) and list comprehensions:

import numpy as np


idata = [
    '         1,       0.0,0, ,    6.2386,0, ,   33.2166,0, ,\n',
    '         2,       5.0,0, ,    6.2385,0, ,   33.2166,0, ,\n',
    '         3,      10.0,0, ,    6.2306,0, ,   33.2175,0, ,\n',
    '         4,      15.0,0, ,    6.2359,0, ,   33.2176,0, ,\n',
    '         5,      20.0,0, ,    6.2387,0, ,   33.2175,0, ,\n']

ll = [[float(x.strip()) for x in s.split(',') if x.strip()] for s in idata]
print(np.array(ll))
# [[ 1.      0.      0.      6.2386  0.     33.2166  0.    ]
#  [ 2.      5.      0.      6.2385  0.     33.2166  0.    ]
#  [ 3.     10.      0.      6.2306  0.     33.2175  0.    ]
#  [ 4.     15.      0.      6.2359  0.     33.2176  0.    ]
#  [ 5.     20.      0.      6.2387  0.     33.2175  0.    ]]

which can also be fed to a Pandas dataframe constructor:

import pandas as pd


df = pd.DataFrame(ll)
print(df)
#      0     1    2       3    4        5    6
# 0  1.0   0.0  0.0  6.2386  0.0  33.2166  0.0
# 1  2.0   5.0  0.0  6.2385  0.0  33.2166  0.0
# 2  3.0  10.0  0.0  6.2306  0.0  33.2175  0.0
# 3  4.0  15.0  0.0  6.2359  0.0  33.2176  0.0
# 4  5.0  20.0  0.0  6.2387  0.0  33.2175  0.0

Upvotes: 1

Related Questions