Reputation: 383
I am working with data from World Ocean Database (WOD), and somehow I ended up with a list that looks like this one:
idata =
[' 1, 0.0,0, , 6.2386,0, , 33.2166,0, ,\n',
' 2, 5.0,0, , 6.2385,0, , 33.2166,0, ,\n',
' 3, 10.0,0, , 6.2306,0, , 33.2175,0, ,\n',
' 4, 15.0,0, , 6.2359,0, , 33.2176,0, ,\n',
' 5, 20.0,0, , 6.2387,0, , 33.2175,0, ,\n']
Is there any easy way to convert this structure into a numpy array or in a friendlier format? I just want to add the information of the columns in a pandas DataFrame.
Upvotes: 1
Views: 98
Reputation: 2215
You might split the values by comma, strip the parts and add the resulting array to a DataFrame
like follows:
import pandas as pd
data = [[item.strip() for item in line.split(',')] for line in idata]
df = pd.DataFrame(data)
In order to safely convert the DataFrame
to numeric values pd.to_numeric
could be used:
df = df.apply(pd.to_numeric)
Upvotes: 1
Reputation: 15349
try: from io import StringIO # Python 3
except: from StringIO import StringIO # Python 2
import pandas as pd
df = pd.read_csv(StringIO(''.join(idata)), index_col=0, header=None, sep=r',\s*', engine='python')
print(df)
# prints:
# 1 2 3 4 5 6 7 8 9 10
# 0
# 1 0.0 0 NaN 6.2386 0 NaN 33.2166 0 NaN NaN
# 2 5.0 0 NaN 6.2385 0 NaN 33.2166 0 NaN NaN
# 3 10.0 0 NaN 6.2306 0 NaN 33.2175 0 NaN NaN
# 4 15.0 0 NaN 6.2359 0 NaN 33.2176 0 NaN NaN
# 5 20.0 0 NaN 6.2387 0 NaN 33.2175 0 NaN NaN
Remove the header=None
if you can include an initial row of idata
that actually specifies helpful column labels. Remove sep=r',\s*', engine='python'
if you're happy for the blank columns to contain blank string objects instead of NaN
.
Upvotes: 0
Reputation: 26896
You could use a combination of string manipulation (i.e. strip()
and split()
) and list
comprehensions:
import numpy as np
idata = [
' 1, 0.0,0, , 6.2386,0, , 33.2166,0, ,\n',
' 2, 5.0,0, , 6.2385,0, , 33.2166,0, ,\n',
' 3, 10.0,0, , 6.2306,0, , 33.2175,0, ,\n',
' 4, 15.0,0, , 6.2359,0, , 33.2176,0, ,\n',
' 5, 20.0,0, , 6.2387,0, , 33.2175,0, ,\n']
ll = [[float(x.strip()) for x in s.split(',') if x.strip()] for s in idata]
print(np.array(ll))
# [[ 1. 0. 0. 6.2386 0. 33.2166 0. ]
# [ 2. 5. 0. 6.2385 0. 33.2166 0. ]
# [ 3. 10. 0. 6.2306 0. 33.2175 0. ]
# [ 4. 15. 0. 6.2359 0. 33.2176 0. ]
# [ 5. 20. 0. 6.2387 0. 33.2175 0. ]]
which can also be fed to a Pandas dataframe constructor:
import pandas as pd
df = pd.DataFrame(ll)
print(df)
# 0 1 2 3 4 5 6
# 0 1.0 0.0 0.0 6.2386 0.0 33.2166 0.0
# 1 2.0 5.0 0.0 6.2385 0.0 33.2166 0.0
# 2 3.0 10.0 0.0 6.2306 0.0 33.2175 0.0
# 3 4.0 15.0 0.0 6.2359 0.0 33.2176 0.0
# 4 5.0 20.0 0.0 6.2387 0.0 33.2175 0.0
Upvotes: 1