Potato
Potato

Reputation: 111

Extract only specific texts from a txt file via python

I have a text file that has the following variables along with their values.

COM 0.95958  eh 26.9817  ehc 26.9817  ehoff    0  ew 0.181982  ewoff -0.00400919  oeh 429.788  sp_icr 15.3199  sp_il -11.4382  sp_pdelay -1.53578e-09  sp_pk_icr 15.0735  sp_pk_icr_f 7.81609e+09  sp_pk_il -12.2937  sp_pk_il_f 7.71614e+09  sp_pk_ild 3.05223  sp_pk_ild_f 7.3963e+08  sp_pk_rxrl -0.0909508  sp_pk_rxrl_f 3.01849e+09  sp_pk_txrl -6.33623  sp_pk_txrl_f 6.5967e+08  sp_rxrl -0.187543  sp_txrl -19.2629 

How do I extract only specific variables and their corresponding values? How do I extract say COM and its value as well as ehc and its value?

for filename in glob.glob(os.path.join(path, '*.log')):
with open(filename, 'rt') as in_file:
    str = in_file.readline()

How do I proceed after reading each line? I know that I can use substrings and extract only the needed text but is there another way I could do it?

Upvotes: 0

Views: 259

Answers (1)

Blownhither Ma
Blownhither Ma

Reputation: 1471

I'm assuming the string in the file is repeated pattern of "ascii name followed by float-like numerics", separated by spaces. Therefore, it is convenient to use regex to parse the string.

import re

s = "COM 0.95958  eh 26.9817  ehc 26.9817  ehoff    0  ew 0.181982  ewoff -0.00400919  oeh 429.788  sp_icr 15.3199  sp_il -11.4382  sp_pdelay -1.53578e-09  sp_pk_icr 15.0735  sp_pk_icr_f 7.81609e+09  sp_pk_il -12.2937  sp_pk_il_f 7.71614e+09  sp_pk_ild 3.05223  sp_pk_ild_f 7.3963e+08  sp_pk_rxrl -0.0909508  sp_pk_rxrl_f 3.01849e+09  sp_pk_txrl -6.33623  sp_pk_txrl_f 6.5967e+08  sp_rxrl -0.187543  sp_txrl -19.2629 "
r = re.compile(r'(\w+)\s+(-?\d+(?:\.\d+)?)')       # into 2 groups

d = dict(r.findall(s))
print(d)                   # {'sp_pk_icr_f': '7.81609', 'COM': '0.95958', ...
print(d['COM'])            # 0.95958 (but it is str)
print(float(d['COM']))     # 0.95958

I didn't convert float-like string to float. If you need it just try: float(d[key])

  • If the file has multiple lines but the properties hold, replace all NEWLINE as in s = open(FILE_NAME).read().replace('\n', '').
  • If "variable name" implies non-numeric beginning letter, replace regex part for variable name with ([a-zA-Z]\w*)

If there is multiple files and you want to keep all the mappings together, simply update the dict.

d = {}
for fn in filenames:
    s = open(fn, 'r').read()
    d.update(r.findall(s))

Now d has var-value pairs from all files.

Upvotes: 1

Related Questions