Reputation: 47
I have a double-spaced txt file with many tables in it but no column names. Each table is separated with an entity title "$ XXX"
For example:
$ Address
1235 ABC 1W4AC 112
468 DSW 1dwq8 158
$ Label Mark
DQQ FSAF FSDSADAS
EWQ EWQEQW DWQd
...
I want to extract a table then modify it, and export .txt along with all other tables, any ideas?
df = pd.read_csv("record.txt", header=None, names=range(3))
table_names = ["$ Address", "$ Label Mark", "$ XXX"]
groups = df[0].isin(table_names).cumsum()
tables = {g.iloc[0,0]: g.iloc[1:] for k,g in df.groupby(groups)}
I am trying this code, but it shows all tables below it, failed to extract the one I want to select. Thanks a lot!
Upvotes: 0
Views: 198
Reputation: 260490
You can first read you text, split it the way you want and pass it to pandas.read_csv
using io.StringIO
.
import pandas as pd
import io
# read the csv
with open('text.csv') as f:
txt = f.read()
# how to read all tables:
all_tables = [pd.read_csv(io.StringIO(t.split('\n',1)[1]), header=None)
for t in txt.split('$') if t.strip()]
# how to read only the (first) table called "Label Mark"
# NB. if there is a chance that the table cannot exist, then it is
# better to create the list first and then check that it contains
# at least one element
pd.read_csv(io.StringIO([t.split('\n',1)[1]
for t in txt.split('$')
if t.strip().startswith('Label Mark')][0]),
header=None)
output #1:
[ 0
0 1235 ABC 1W4AC 112
1 468 DSW 1dwq8 158,
0
0 DQQ FSAF FSDSADAS
1 EWQ EWQEQW DWQd]
output #2:
0
0 DQQ FSAF FSDSADAS
1 EWQ EWQEQW DWQd
NB. is the separator in the csv is not comma but spaces, then the sep='\s+'
option should be added to pd.read_csv
Upvotes: 1