lokibd
lokibd

Reputation: 193

Text File Parsing with Python and Save numbers in an array and export in csv

I want to parse some text in python 2.7 and export the the results in an array. For example,

Record # 3741: 2018 Feb 16  13:26:27.632  [00] abc adbada adba adab adbadba bdadba 
 =>  abc_cdf_ghk_adha 468 H S16.7:Output [0] 0 [1] 0 [2] 0 [3] 0 
   Record # 3742: 2018 Feb 16  13:26:27.632  [00] abc adbada adba adab adbadba bdadba 
 =>  abc_cdf_ghk_adha 474 H S16.7:Output [4] 0 [5] 0 [6] 0 [7] 0 [8] 0 
   Record # 3795: 2018 Feb 16  13:26:27.633  [00] abc adbada adba adab adbadba bdadba 
 =>  abc_cdf_ghk_adha 468 H S16.7:Output [0] 16861 [1] 16867 [2] 16873 [3] 16878 
   Record # 3800: 2018 Feb 16  13:26:27.633  [00] abc adbada adba adab adbadba bdadba 
 =>  abc_cdf_ghk_adha 474 H S16.7:Output [4] 16873 [5] 16861 [6] 0 [7] 0 [8] 0 
   Record # 3931: 2018 Feb 16  13:26:27.634  [00] abc adbada adba adab adbadba bdadba 
 =>  abc_cdf_ghk_adha 468 H S16.7:Output [0] 16873 [1] 16867 [2] 128 [3] 128 
   Record # 3932: 2018 Feb 16  13:26:27.634  [00] abc adbada adba adab adbadba bdadba 
 =>  abc_cdf_ghk_adha 474 H S16.7:Output [4] 16878 [5] 16873 [6] 16855 [7] 16867 [8] 16873 
   Record # 3971: 2018 Feb 16  13:26:27.635  [00] abc adbada adba adab adbadba bdadba 
 =>  abc_cdf_ghk_adha 468 H S16.7:Output [0] 16855 [1] 16849 [2] 129 [3] 129 
   Record # 3974: 2018 Feb 16  13:26:27.635  [00] abc adbada adba adab adbadba bdadba 
 =>  abc_cdf_ghk_adha 474 H S16.7:Output [4] 16867 [5] 16867 [6] 16861 [7] 16861 [8] 16867

From this specific lines I want to parse the even lines and save the numbers in an array. I should have 9 arrays (A0,A1,A2,A3,...A8) and keep updating the arrays in a loop.

In this aforementioned case, A0 should be the values after [0]

Variables: 
A0 --> [0] numbers after [0]
A1 --> [1] numbers after [1]
A2 --> [2] numbers after [2]
A3 --> [3] numbers after [3]
A4 --> [4] numbers after [4]
A5 --> [5] numbers after [5]
A6 --> [6] numbers after [6]
A7 --> [7] numbers after [7]
A8 --> [8] numbers after [8]

At the end of this loop I will save the variables in a csv file.

My approach with python is like this:

import re
from collections import defaultdict
import numpy as np
import matplotlib.pyplot as plt

result = defaultdict(list)
with open("C:\\Users\\ibrah\\Documents\\Python\\Test\\Input.txt","r") as f:
    for c, l in enumerate(f.readlines()): # Get enumerate to keep track of line numbers
        if (c+1) % 2 == 0: # Even lines (line numbers start at 0)
            for m in re.findall('(?:\[(\d+)\])\s(\d+)', l): # 1st regex group will the the number in the []s and 2nd group will be the number after
                result[int(m[0])].append(int(m[1]))
RS0 = 10*(np.log10(result[0]/128))
RS1 = 10*(np.log10(result[1]/128))
RS2 = 10*(np.log10(result[2]/128))
RS3 = 10*(np.log10(result[3]/128))
RS4 = 10*(np.log10(result[4]/128))
RS5 = 10*(np.log10(result[5]/128))
RS6 = 10*(np.log10(result[6]/128))
RS7 = 10*(np.log10(result[7]/128))
RS8 = 10*(np.log10(result[8]/128))
plt.plot([RS0,RS1,RS2,RS3,RS4,RS5,RS6,RS7,RS8])
#plt.plot([RS0])
plt.ylabel('SNR')
plt.show()

** Here I am trying to plot the variables (RS0...RS8) and also export the values to csv file. May you please assist me to finalize my code to perform this operations.

Upvotes: 0

Views: 82

Answers (2)

ktdrv
ktdrv

Reputation: 3673

You could use a better regex to grab all those occurrences of [X] Y. Combine that with a with clause, enumerate(), and a defaultdict(list) to keep your code cleaner, you get something like this:

import re
from collections import defaultdict

result = defaultdict(list)
with open('C:\Users\xxxx\Contents.txt', 'r') as f:
    for c, l in enumerate(f.readlines()): # Get enumerate to keep track of line numbers
        if (c+1) % 2 == 0: # Even lines (line numbers start at 0)
            for m in re.findall('(?:\[(\d+)\])\s(\d+)', l): # 1st regex group will the the number in the []s and 2nd group will be the number after
                result[int(m[0])].append(int(m[1]))

A0 = result[0]
A1 = result[1]
# ...

If you want to plot the resulting arrays thereafter, first make sure you first convert the Python lists to numpy arrays via np.array(result[0]). Otherwise, you'd get:

TypeError: unsupported operand type(s) for /: 'list' and 'int'

Then, make sure you handle the 0s in the array somehow before you take their logarithm because log(0) is undefined. You could try adding 1 to the values and then taking their log.

Finally, you don't need to create all of those variables -- a list comprehension would work much more elegantly:

plt.plot([10*np.log10(np.array(result[i])/128) for i in sorted(result.keys())])

Upvotes: 2

bla
bla

Reputation: 1870

If for each line beginning with => you want the numbers after Output as a list of int you could do something like this:

import re

with open('file.txt') as f:
    lines = [l for l in f.readlines() if l.startswith('=>')]

parsed_lines = []
for line in lines:
    numbers = re.findall('\d+', line.split('Output')[1])
    parsed_lines.append([int(e) for e in numbers])

print(parsed_lines)

So, for your given file you would have:

[
    [0, 0, 1, 0, 2, 0, 3, 0],
    [4, 0, 5, 0, 6, 0, 7, 0, 8, 0],
    [0, 16861, 1, 16867, 2, 16873, 3, 16878],
    [4, 16873, 5, 16861, 6, 0, 7, 0, 8, 0],
    [0, 16873, 1, 16867, 2, 128, 3, 128],
    [4, 16878, 5, 16873, 6, 16855, 7, 16867, 8, 16873],
    [0, 16855, 1, 16849, 2, 129, 3, 129],
    [4, 16867, 5, 16867, 6, 16861, 7, 16861, 8, 16867]
]

Upvotes: -1

Related Questions