pandas/numpy selection of values of indices

A Python question. I have a problem. There is a formatted table below (the starts are for more attentions and not really in table):

   Step  Time          Apple_price         fluctuation 
BFGS:    0 18:21:43    -6442.333161        7.4744
BFGS:    1 18:21:43   *-6442.899477        5.8484*
      Step     Time       Apple_price         fluctuation
BFGS:    0 18:21:53    -6441.911200       16.3190
BFGS:    1 18:21:53    -6442.540975       10.6048
BFGS:    2 18:21:53    -6443.107163        7.6685
BFGS:    3 18:21:53    -6443.565044        6.2186
BFGS:    4 18:21:54    *-6443.954663        5.7485*
      Step     Time      Apple_price         fluctuation
BFGS:    0 18:27:00    -6440.611426       24.6802
BFGS:    1 18:27:00    -6441.602767       21.3009
BFGS:    2 18:27:00    -6442.446886       15.6698
BFGS:    3 18:27:01    -6443.084822       11.6312
BFGS:    4 18:27:01    -6443.582671        8.6795
BFGS:    5 18:27:01    -6444.019236        7.4906
BFGS:    6 18:27:01    -6444.389951        6.7435
BFGS:    7 18:27:02   *-6444.732455        6.5221*

I would like to extract the values between "*" as follows:

    -6442.899477        5.8484
    -6443.954663        5.7485
    -6444.732455        6.5221

my code is as follows:

import pandas as pd
import numpy as np


all_lines = []                                   
file_name = input("What's the file name with extension?: ")
with open (f'{file_name}', 'r') as file:                     
    for each_line in file:
        all_lines.append(each_line.strip())
        

#print(all_lines)

for j in all_lines:
    if j == 0:
        j = j + 1
        if 'fluctuation' in i:
            all_lines.index(j-1)
print(j)

Unfortunately, the output is only the first line of answer:

-6442.899477 5.8484

Let me know how it can extract values of indices in some lists

Upvotes: 0

Views: 58

Answers (2)

I think that I found a simple solution:

1- in bash

    awk '{$1=$2=$3=""; print $0}' filename.out > filename.out2

2- type "Error" in last line 3- the following code

import numpy as np
import pandas as pd
   
f = open ('filename.out2', 'r')   

all_lines = []

for each_line in f:
    all_lines.append(each_line.strip())
    
#for j in all_lines:
#    print(j)
    

df = pd.DataFrame(all_lines)
count_row = df.shape[0]         # Gives number of rows
print("count_row=", count_row)

count_col = df.shape[1]         # Gives number of columns
print("count_col=", count_col)



max_sw = 'Error'
lines = [i for i in range(len(all_lines)) if all_lines[i] == max_sw]
#print([i for i in range(len(all_lines)) if all_lines[i]== max_sw])
print(lines)


lines2 = []
for i in lines:
    i = i - 1
    lines2.append(i)
print(lines2)


lines3 = []
for i in lines2:
    if i != -1:
#     print(i)
#      lines3 = [i for i in all_lines[i]] 
#      return 
       lines3.append(all_lines[i])
print (lines3)

4- the answer:

count_row= 19

count_col= 1

[0, 3, 9, 18]

[-1, 2, 8, 17]

['-6442.899477 5.8484', '-6443.954663 5.7485', '-6444.732455 6.5221']

anyway, I welcome any new help.

Upvotes: 0

Import Regular Expression

import re

Preparing data:

text = """   Step  Time          Apple_price         fluctuation 
BFGS:    0 18:21:43    -6442.333161        7.4744
BFGS:    1 18:21:43   *-6442.899477        5.8484*
      Step     Time       Apple_price         fluctuation
BFGS:    0 18:21:53    -6441.911200       16.3190
BFGS:    1 18:21:53    -6442.540975       10.6048
BFGS:    2 18:21:53    -6443.107163        7.6685
BFGS:    3 18:21:53    -6443.565044        6.2186
BFGS:    4 18:21:54    *-6443.954663        5.7485*
      Step     Time      Apple_price         fluctuation
BFGS:    0 18:27:00    -6440.611426       24.6802
BFGS:    1 18:27:00    -6441.602767       21.3009
BFGS:    2 18:27:00    -6442.446886       15.6698
BFGS:    3 18:27:01    -6443.084822       11.6312
BFGS:    4 18:27:01    -6443.582671        8.6795
BFGS:    5 18:27:01    -6444.019236        7.4906
BFGS:    6 18:27:01    -6444.389951        6.7435
BFGS:    7 18:27:02   *-6444.732455        6.5221*"""

Define regular expression: between * what characters may contain

p = re.compile(r'\*[- 0-9.]*\*')

Match regular expression and text

a = p.findall(text)

a: array of matches. Enumerate retrieves index and content:

for k, v in enumerate(a):
    print(k, v)

Output:

0 -6442.899477 5.8484 1 -6443.954663 5.7485 2 -6444.732455 6.5221

Upvotes: 1

Related Questions