Albert
Albert

Reputation: 121

Change amount of columns in Pandas

I have a .txt file with a single line and many numbers separated by a variable amount of spaces.

32 45 2.65   -845     1 -84    97.236        454   35.78 77.12    948.87       
         151    -23.5         -787.48     13.005   31

I know every x numbers (being x a fix amount in every file) there should be a break. For instance the first 4 numbers in 4 columns the first row, next 4 the second row and so on.

+-------+---------+--------+------+
| col1  |  col2   |  col3  | col4 |
+-------+---------+--------+------+
| 32    | 45      | 2.65   | -845 |
| 1     | -84     | 97.236 |  454 |
| 35.78 | 77.12   | 948.87 |  151 |
| -23.5 | -787.48 | 13.005 |   31 |
+-------+---------+--------+------+

The objective is to create a .csv file with the right amount of columns. So far I have been able to separate the numbers:

import pandas as pd
data = pd.read_csv(table, sep='\s{2,}', header = None, engine='python')
export = data.to_csv(csvtable, header=False)

This creates a csv with one row and as many columns as values but I'd like to have the data in a specific amount of columns, 4 in this case.

Upvotes: 1

Views: 842

Answers (2)

hirolau
hirolau

Reputation: 13921

I would load the data using Numpy

import numpy
import pandas

data = numpy.loadtxt('temp.txt').reshape(-1, 4)
pandas.DataFrame(data)

Edit: Swapped -1 and 4.

Upvotes: 1

jezrael
jezrael

Reputation: 863351

Create list of all possible values from file, then reshape by numpy.reshape for 4 columns DataFrame:

with open('data.txt') as f:
    L = [x for line in f for x in line.strip().split()]
    print (L)
['32', '45', '2.65', '-845', '1', '-84', '97.236', '454', 
 '35.78', '77.12', '948.87', '151', '-23.5', '-787.48', '13.005', '31']


df = pd.DataFrame(np.array(L).reshape(-1, 4))
print (df)
       0        1       2     3
0     32       45    2.65  -845
1      1      -84  97.236   454
2  35.78    77.12  948.87   151
3  -23.5  -787.48  13.005    31

But solution not working, if not possible create full 4 columns, then it is a bit complicated:

#missing last value
print (L)
['32', '45', '2.65', '-845', '1', '-84', '97.236', '454', '35.78', 
 '77.12', '948.87', '151', '-23.5', '-787.48', '13.005']

arr = np.empty(((len(L) - 1)//4 + 1)*4, dtype='O')
arr[:len(L)] = L
df = pd.DataFrame(arr.reshape((-1, 4))).fillna('0')
print(df)
       0        1       2     3
0     32       45    2.65  -845
1      1      -84  97.236   454
2  35.78    77.12  948.87   151
3  -23.5  -787.48  13.005     0

Upvotes: 2

Related Questions