Reputation: 1
A Python question. I have a problem. There is a formatted table below (the starts are for more attentions and not really in table):
Step Time Apple_price fluctuation
BFGS: 0 18:21:43 -6442.333161 7.4744
BFGS: 1 18:21:43 *-6442.899477 5.8484*
Step Time Apple_price fluctuation
BFGS: 0 18:21:53 -6441.911200 16.3190
BFGS: 1 18:21:53 -6442.540975 10.6048
BFGS: 2 18:21:53 -6443.107163 7.6685
BFGS: 3 18:21:53 -6443.565044 6.2186
BFGS: 4 18:21:54 *-6443.954663 5.7485*
Step Time Apple_price fluctuation
BFGS: 0 18:27:00 -6440.611426 24.6802
BFGS: 1 18:27:00 -6441.602767 21.3009
BFGS: 2 18:27:00 -6442.446886 15.6698
BFGS: 3 18:27:01 -6443.084822 11.6312
BFGS: 4 18:27:01 -6443.582671 8.6795
BFGS: 5 18:27:01 -6444.019236 7.4906
BFGS: 6 18:27:01 -6444.389951 6.7435
BFGS: 7 18:27:02 *-6444.732455 6.5221*
I would like to extract the values between "*" as follows:
-6442.899477 5.8484
-6443.954663 5.7485
-6444.732455 6.5221
my code is as follows:
import pandas as pd
import numpy as np
all_lines = []
file_name = input("What's the file name with extension?: ")
with open (f'{file_name}', 'r') as file:
for each_line in file:
all_lines.append(each_line.strip())
#print(all_lines)
for j in all_lines:
if j == 0:
j = j + 1
if 'fluctuation' in i:
all_lines.index(j-1)
print(j)
Unfortunately, the output is only the first line of answer:
-6442.899477 5.8484
Let me know how it can extract values of indices in some lists
Upvotes: 0
Views: 58
Reputation: 1
I think that I found a simple solution:
1- in bash
awk '{$1=$2=$3=""; print $0}' filename.out > filename.out2
2- type "Error" in last line 3- the following code
import numpy as np
import pandas as pd
f = open ('filename.out2', 'r')
all_lines = []
for each_line in f:
all_lines.append(each_line.strip())
#for j in all_lines:
# print(j)
df = pd.DataFrame(all_lines)
count_row = df.shape[0] # Gives number of rows
print("count_row=", count_row)
count_col = df.shape[1] # Gives number of columns
print("count_col=", count_col)
max_sw = 'Error'
lines = [i for i in range(len(all_lines)) if all_lines[i] == max_sw]
#print([i for i in range(len(all_lines)) if all_lines[i]== max_sw])
print(lines)
lines2 = []
for i in lines:
i = i - 1
lines2.append(i)
print(lines2)
lines3 = []
for i in lines2:
if i != -1:
# print(i)
# lines3 = [i for i in all_lines[i]]
# return
lines3.append(all_lines[i])
print (lines3)
4- the answer:
count_row= 19
count_col= 1
[0, 3, 9, 18]
[-1, 2, 8, 17]
['-6442.899477 5.8484', '-6443.954663 5.7485', '-6444.732455 6.5221']
anyway, I welcome any new help.
Upvotes: 0
Reputation: 350
Import Regular Expression
import re
Preparing data:
text = """ Step Time Apple_price fluctuation
BFGS: 0 18:21:43 -6442.333161 7.4744
BFGS: 1 18:21:43 *-6442.899477 5.8484*
Step Time Apple_price fluctuation
BFGS: 0 18:21:53 -6441.911200 16.3190
BFGS: 1 18:21:53 -6442.540975 10.6048
BFGS: 2 18:21:53 -6443.107163 7.6685
BFGS: 3 18:21:53 -6443.565044 6.2186
BFGS: 4 18:21:54 *-6443.954663 5.7485*
Step Time Apple_price fluctuation
BFGS: 0 18:27:00 -6440.611426 24.6802
BFGS: 1 18:27:00 -6441.602767 21.3009
BFGS: 2 18:27:00 -6442.446886 15.6698
BFGS: 3 18:27:01 -6443.084822 11.6312
BFGS: 4 18:27:01 -6443.582671 8.6795
BFGS: 5 18:27:01 -6444.019236 7.4906
BFGS: 6 18:27:01 -6444.389951 6.7435
BFGS: 7 18:27:02 *-6444.732455 6.5221*"""
Define regular expression: between * what characters may contain
p = re.compile(r'\*[- 0-9.]*\*')
Match regular expression and text
a = p.findall(text)
a: array of matches. Enumerate retrieves index and content:
for k, v in enumerate(a):
print(k, v)
Output:
0 -6442.899477 5.8484 1 -6443.954663 5.7485 2 -6444.732455 6.5221
Upvotes: 1