Pandas Multi-Table Extraction

Question

I have a double-spaced txt file with many tables in it but no column names. Each table is separated with an entity title "$ XXX"

For example:

$  Address
  1235  ABC  1W4AC 112
  468  DSW  1dwq8  158
$  Label Mark
  DQQ  FSAF  FSDSADAS 
  EWQ  EWQEQW   DWQd

...

I want to extract a table then modify it, and export .txt along with all other tables, any ideas?

df = pd.read_csv("record.txt", header=None, names=range(3))
table_names = ["$ Address", "$ Label Mark", "$ XXX"]
groups = df[0].isin(table_names).cumsum()
tables = {g.iloc[0,0]: g.iloc[1:] for k,g in df.groupby(groups)}

I am trying this code, but it shows all tables below it, failed to extract the one I want to select. Thanks a lot!

mozway · Accepted Answer

You can first read you text, split it the way you want and pass it to pandas.read_csv using io.StringIO.

import pandas as pd
import io

# read the csv
with open('text.csv') as f:
    txt = f.read()

# how to read all tables:
all_tables = [pd.read_csv(io.StringIO(t.split('
',1)[1]), header=None)
              for t in txt.split('$') if t.strip()]

# how to read only the (first) table called "Label Mark"
# NB. if there is a chance that the table cannot exist, then it is
# better to create the list first and then check that it contains
# at least one element
pd.read_csv(io.StringIO([t.split('
',1)[1]
                         for t in txt.split('$')
                         if t.strip().startswith('Label Mark')][0]), 
            header=None)

output #1:

[                        0
 0    1235  ABC  1W4AC 112
 1    468  DSW  1dwq8  158,
                         0
 0    DQQ  FSAF  FSDSADAS 
 1      EWQ  EWQEQW   DWQd]

output #2:

                        0
0    DQQ  FSAF  FSDSADAS 
1      EWQ  EWQEQW   DWQd

NB. is the separator in the csv is not comma but spaces, then the sep='\s+' option should be added to pd.read_csv

Pandas Multi-Table Extraction

Answers (1)

Related Questions