yashraj
yashraj

Reputation: 31

Read rows in text file based on different column headers using Python

I have text file which contains data for two tables with two column headers. I want to read each line and save lines or rows in two files for each table. I want lines after (number amount) columns header in one file and (code volume dim) column header in other. Or, I want to separate data for similar table.

First few lines of the file. Number of rows in tables may vary.

    number amount
    10 34
    23 65
    54 07

    code volume dim
    1 56 34
    23 57 565

    number amount
    40 674
    73 2365

    code volume dim
    341 3456 6534
    23 0957 908565

    number amount
    210 4534
    2343 4565

same pattern repeats.

Upvotes: 2

Views: 701

Answers (1)

jezrael
jezrael

Reputation: 863166

If text number is same in all file, and text code too, you can use read_csv and then filter columns rows by subset by contains and isnull, notnull:

import pandas as pd
import numpy as np
import io

temp=u""" number amount
    10 34
    23 65
    54 07

    code volume dim
    1 56 34
    23 57 565

    number amount
    40 674
    73 2365

    code volume dim
    341 3456 6534
    23 0957 908565

    number amount
    210 4534
    2343 4565"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep="\s+", index_col=None, skip_blank_lines=True, names=['a', 'b', 'c'])
print df
         a       b       c
0   number  amount     NaN
1       10      34     NaN
2       23      65     NaN
3       54      07     NaN
4     code  volume     dim
5        1      56      34
6       23      57     565
7   number  amount     NaN
8       40     674     NaN
9       73    2365     NaN
10    code  volume     dim
11     341    3456    6534
12      23    0957  908565
13  number  amount     NaN
14     210    4534     NaN
15    2343    4565     NaN
df = df[~((df.a.str.contains('number')) | (df.a.str.contains('code')))]

df1 = df[df.c.isnull()]
df1 = df1[['a', 'b']].reset_index(drop=True)
df1.columns = ['number','amount']

#if all numbers are integers
df1 = df1.astype(int)
print df1
   number  amount
0      10      34
1      23      65
2      54       7
3      40     674
4      73    2365
5     210    4534
6    2343    4565

df2 = df[df.c.notnull()].reset_index(drop=True)
df2.columns = ['code','volume', 'dim']
#if all numbers are integers
df2 = df2.astype(int)
print df2
   code  volume     dim
0     1      56      34
1    23      57     565
2   341    3456    6534
3    23     957  908565

Upvotes: 1

Related Questions