Reputation: 755
Below is the content from myfile.csv
1st 2nd 3rd 4th 5th
2061100 10638650 -8000 25 [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
2061800 10639100 -8100 26 [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
2061150 10638750 -8250 25 [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0]
2061650 10639150 -8200 25 [4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]
2061350 10638800 -8250 3 [5.0, 5.0, 5.0]
2060950 10638700 -8000 1 [1.0]
2061700 10639100 -8100 11 [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
2061050 10638800 -8250 6 [3.0, 3.0, 3.0, 3.0, 3.0, 3.0]
2061500 10639150 -8200 1 [4.0]
2061250 10638850 -8150 16 [5.0, 5.0, 5.0, 5.0]
My current code:
from numpy import genfromtxt
mydata = genfromtxt('myfile.csv', delimiter=',')
arr = np.array(mydata)
col5 = arr[:,4]
However, I want to read the 5th column as a list and then read all elements from the list to do further calculation. What shall I do?
Upvotes: 0
Views: 136
Reputation: 40973
Pandas can read fixed width files (in contrast to tab/comma delimited files) like yours:
import pandas as pd
import ast
df = pd.read_fwf('test.txt', colspecs=[(41,100)])['5th']\
.apply(lambda x: ast.literal_eval(x))
You get:
>>> df
0 [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
1 [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
2 [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0]
3 [4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]
4 [5.0, 5.0, 5.0]
5 [1.0]
6 [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
7 [3.0, 3.0, 3.0, 3.0, 3.0, 3.0]
8 [4.0]
9 [5.0, 5.0, 5.0, 5.0]
Name: 5th, dtype: object
Upvotes: 1
Reputation: 23773
If the whitespace between the columns are tabs:
import csv, ast, pprint
result = list()
with open('in.txt') as in_file:
reader = csv.reader(in_file, delimiter = '\t')
for line in reader:
line[:4] = map(int, line[:4])
line[4] = ast.literal_eval(line[4])
result.append(line)
pprint.pprint(result)
>>>
[[2061100, 10638650, -8000, 25, [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]],
[2061800, 10639100, -8100, 26, [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]],
[2061150, 10638750, -8250, 25, [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0]],
[2061650, 10639150, -8200, 25, [4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]],
[2061350, 10638800, -8250, 3, [5.0, 5.0, 5.0]],
[2060950, 10638700, -8000, 1, [1.0]],
[2061700, 10639100, -8100, 11, [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]],
[2061050, 10638800, -8250, 6, [3.0, 3.0, 3.0, 3.0, 3.0, 3.0]],
[2061500, 10639150, -8200, 1, [4.0]],
[2061250, 10638850, -8150, 16, [5.0, 5.0, 5.0, 5.0]]]
>>>
A variation on that theme:
with open('in.txt') as in_file:
reader = csv.reader(in_file, delimiter = '\t')
result = [[ast.literal_eval(item) for item in line] for line in reader]
Upvotes: 2
Reputation: 104032
I suppose I would be tempted to just do this manually:
with open(fn) as f:
header=next(f).strip()
print(header)
for row in f:
row=row.rstrip()
lp,_,rp=row.partition('[')
rp=rp.strip(']')
lp_data=list(map(int, lp.split()))
rp_data=list(map(float, rp.split(',')))
print(lp_data+[rp_data])
Prints:
1st 2nd 3rd 4th 5th
[2061100, 10638650, -8000, 25, [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]]
[2061800, 10639100, -8100, 26, [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]]
[2061150, 10638750, -8250, 25, [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0]]
[2061650, 10639150, -8200, 25, [4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]]
[2061350, 10638800, -8250, 3, [5.0, 5.0, 5.0]]
[2060950, 10638700, -8000, 1, [1.0]]
[2061700, 10639100, -8100, 11, [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]]
[2061050, 10638800, -8250, 6, [3.0, 3.0, 3.0, 3.0, 3.0, 3.0]]
[2061500, 10639150, -8200, 1, [4.0]]
[2061250, 10638850, -8150, 16, [5.0, 5.0, 5.0, 5.0]]
Upvotes: 1