Reputation: 137
I have a file with data of multiple transaction status, refer "FILE PIC". Now I want to get different dataframes for all transaction status without total of third column. The string in first column are fixed for all the transaction status and row below the string are column names.
By the strings, I want to differentiate the dataframes, for reference see required output below.
SUCCESS TRANSACTIONS
SA SB SC
255623 145244651 55
45454544 NA 85
1300256655 3511614646 101
241
REJECTED TRANSACTIONS
RJA RJB RJC
3652656 NA 78
7884045455 45681247 97
148428 1242154661 75
250
REFUND TRANSACTIONS
RA RB RC
5546446166 NA 545
1230001556 NA 75
4548754545 4548545877 52
672
SA SB SC
255623 145244651 55
45454544 NA 85
1300256655 3511614646 101
RJA RJB RJC
3652656 NA 78
7884045455 45681247 97
148428 1242154661 75
RA RB RC
5546446166 NA 545
1230001556 NA 75
4548754545 4548545877 52
Upvotes: 1
Views: 159
Reputation: 17156
One way to parse file into dataframes.
Code
from io import StringIO
import re
def parse_file(filenm):
'''
split file into block of rows based upon 1 or more blank rows
-re.split('\n\s*\n', data)
convert each block into dataframe (function make_dataframe)
'''
def make_dataframe(p):
' Create dataframe from block of rows from file (not including first/last rows)'
return pd.read_csv(
StringIO('\n'.join(p[1:-1])),
sep = '\s+',
engine = 'python')
with open(filenm) as f:
return {p[0]:make_dataframe(p) # p[0] is section name
for s in re.split('\n\s*\n', f.read()) # split file on 1+ blank lines
if (p:=s.split('\n'))} # p has block of lines
Usage
# Parse file into dataframes
dfs = parse_files('test.txt')
# Show dataframes
dfs = parse_file('test.txt')
for k, df in dfs.items():
print(k)
print(df)
print()
Output
SUCCESS TRANSACTIONS DF
SA SB SC
0 255623 145244651 55
1 45454544 NaN 85
2 1300256655 3511614646 101
REJECTED TRANSACTIONS DF
RJA RJB RJC
0 3652656 NaN 78
1 7884045455 45681247 97
2 148428 1242154661 75
REFUND TRANSACTIONS DF
RA RB RC
0 5546446166 NaN 545
1 1230001556 NaN 75
2 4548754545 4548545877 52
Upvotes: 2
Reputation:
If the data are stored in string ds
:
print('\n'.join(['\n'.join(ta.split('\n')[:-1]) for ta in ds.split(\n\n)]))
prints what you have requested.
SUCCESS TRANSACTIONS
SA SB SC
255623 145244651 55
45454544 NA 85
1300256655 3511614646 101
REJECTED TRANSACTIONS
RJA RJB RJC
3652656 NA 78
7884045455 45681247 97
148428 1242154661 75
REFUND TRANSACTIONS
RA RB RC
5546446166 NA 545
1230001556 NA 75
4548754545 4548545877 52
Below the entire code printing the output above:
data_as_string = """\
SUCCESS TRANSACTIONS
SA SB SC
255623 145244651 55
45454544 NA 85
1300256655 3511614646 101
241
REJECTED TRANSACTIONS
RJA RJB RJC
3652656 NA 78
7884045455 45681247 97
148428 1242154661 75
250
REFUND TRANSACTIONS
RA RB RC
5546446166 NA 545
1230001556 NA 75
4548754545 4548545877 52
672"""
ds = data_as_string
nl = '\n'
dnl = '\n\n'
# with finally:
print(nl.join([nl.join(ta.split(nl)[:-1]) for ta in ds.split(dnl)]))
Upvotes: 0
Reputation: 7230
Let's prepare the data:
import pandas as pd
from io import StringIO
data = '''
SUCCESS TRANSACTIONS
SA SB SC
255623 145244651 55
45454544 NA 85
1300256655 3511614646 101
241
REJECTED TRANSACTIONS
RJA RJB RJC
3652656 NA 78
7884045455 45681247 97
148428 1242154661 75
250
REFUND TRANSACTIONS
RA RB RC
5546446166 NA 545
1230001556 NA 75
4548754545 4548545877 52
672
'''
df = pd.read_csv(StringIO(data), sep='\s\s+', engine='python', names=[*'abc'])
I'm not sure if this data are exactly the same as the one on your side. So be careful with the code below. Anyway, that's what you posted, so be it.
The common way to split such construction is like this:
cumsum
method to mark rows of a contiguous group with the same number;groupby
.Here's how it can look in code:
grouper = df['a'].str.endswith('TRANSACTIONS').cumsum()
groups = {}
for _, gr in df.groupby(grouper):
name = gr.iloc[0, 0]
groups[name] = (
gr[1:-1] # skip the very first row with a general name, and the last one with total
.T.set_index(gr.index[1]).T # use the row under the general name as a column names
.reset_index(drop=True) # drop old indexes
.rename_axis(columns='')
)
for name, group in groups.items():
display(group.style.set_caption(name))
The output:
Upvotes: 3