Reputation: 121
I have a .txt file with a single line and many numbers separated by a variable amount of spaces.
32 45 2.65 -845 1 -84 97.236 454 35.78 77.12 948.87
151 -23.5 -787.48 13.005 31
I know every x numbers (being x a fix amount in every file) there should be a break. For instance the first 4 numbers in 4 columns the first row, next 4 the second row and so on.
+-------+---------+--------+------+ | col1 | col2 | col3 | col4 | +-------+---------+--------+------+ | 32 | 45 | 2.65 | -845 | | 1 | -84 | 97.236 | 454 | | 35.78 | 77.12 | 948.87 | 151 | | -23.5 | -787.48 | 13.005 | 31 | +-------+---------+--------+------+
The objective is to create a .csv file with the right amount of columns. So far I have been able to separate the numbers:
import pandas as pd
data = pd.read_csv(table, sep='\s{2,}', header = None, engine='python')
export = data.to_csv(csvtable, header=False)
This creates a csv with one row and as many columns as values but I'd like to have the data in a specific amount of columns, 4 in this case.
Upvotes: 1
Views: 842
Reputation: 13921
I would load the data using Numpy
import numpy
import pandas
data = numpy.loadtxt('temp.txt').reshape(-1, 4)
pandas.DataFrame(data)
Edit: Swapped -1 and 4.
Upvotes: 1
Reputation: 863351
Create list of all possible values from file, then reshape by numpy.reshape for 4 columns DataFrame:
with open('data.txt') as f:
L = [x for line in f for x in line.strip().split()]
print (L)
['32', '45', '2.65', '-845', '1', '-84', '97.236', '454',
'35.78', '77.12', '948.87', '151', '-23.5', '-787.48', '13.005', '31']
df = pd.DataFrame(np.array(L).reshape(-1, 4))
print (df)
0 1 2 3
0 32 45 2.65 -845
1 1 -84 97.236 454
2 35.78 77.12 948.87 151
3 -23.5 -787.48 13.005 31
But solution not working, if not possible create full 4 columns, then it is a bit complicated:
#missing last value
print (L)
['32', '45', '2.65', '-845', '1', '-84', '97.236', '454', '35.78',
'77.12', '948.87', '151', '-23.5', '-787.48', '13.005']
arr = np.empty(((len(L) - 1)//4 + 1)*4, dtype='O')
arr[:len(L)] = L
df = pd.DataFrame(arr.reshape((-1, 4))).fillna('0')
print(df)
0 1 2 3
0 32 45 2.65 -845
1 1 -84 97.236 454
2 35.78 77.12 948.87 151
3 -23.5 -787.48 13.005 0
Upvotes: 2