Reputation: 31
I have text file which contains data for two tables with two column headers. I want to read each line and save lines or rows in two files for each table. I want lines after (number amount) columns header in one file and (code volume dim) column header in other. Or, I want to separate data for similar table.
First few lines of the file. Number of rows in tables may vary.
number amount
10 34
23 65
54 07
code volume dim
1 56 34
23 57 565
number amount
40 674
73 2365
code volume dim
341 3456 6534
23 0957 908565
number amount
210 4534
2343 4565
same pattern repeats.
Upvotes: 2
Views: 701
Reputation: 863166
If text number
is same in all file, and text code
too, you can use read_csv
and then filter columns rows by subset by contains
and isnull
,
notnull
:
import pandas as pd
import numpy as np
import io
temp=u""" number amount
10 34
23 65
54 07
code volume dim
1 56 34
23 57 565
number amount
40 674
73 2365
code volume dim
341 3456 6534
23 0957 908565
number amount
210 4534
2343 4565"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep="\s+", index_col=None, skip_blank_lines=True, names=['a', 'b', 'c'])
print df
a b c
0 number amount NaN
1 10 34 NaN
2 23 65 NaN
3 54 07 NaN
4 code volume dim
5 1 56 34
6 23 57 565
7 number amount NaN
8 40 674 NaN
9 73 2365 NaN
10 code volume dim
11 341 3456 6534
12 23 0957 908565
13 number amount NaN
14 210 4534 NaN
15 2343 4565 NaN
df = df[~((df.a.str.contains('number')) | (df.a.str.contains('code')))]
df1 = df[df.c.isnull()]
df1 = df1[['a', 'b']].reset_index(drop=True)
df1.columns = ['number','amount']
#if all numbers are integers
df1 = df1.astype(int)
print df1
number amount
0 10 34
1 23 65
2 54 7
3 40 674
4 73 2365
5 210 4534
6 2343 4565
df2 = df[df.c.notnull()].reset_index(drop=True)
df2.columns = ['code','volume', 'dim']
#if all numbers are integers
df2 = df2.astype(int)
print df2
code volume dim
0 1 56 34
1 23 57 565
2 341 3456 6534
3 23 957 908565
Upvotes: 1