Reputation: 29
I'm having a variable which holds the contents that is somewhat similar to this
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
Main_data1;a;b;c;dss;e;1
Main_data2;aa;bb;sdc;d;e;2
Main_data3;aaa;bbb;ccce;d;e;3
Main_data4;aaaa;bbbb;cc;d;e;4
Main_data5;aaaaa;bbbbb;cccc;d;e;5
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
I want to read data that starts with Main_data1.{ Read only the last column and store it into a list} . Please note that this is a variable that holds this data and this is not a file.
My Desired Output:
Some_list=[1,2,3,4,5]
I thought of using something like this.
for line in var_a.splitlines():
if Main_data1 in line:
print (line)
But there are more than 200 lines from which I need to read the last column. What could be an efficient way of doing this
Upvotes: 0
Views: 150
Reputation: 5292
My approach is regex since it can control over pattern more-
File content
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
Main_data1;a;b;c;dss;e;1
Main_data2;aa;bb;sdc;d;e;2
Main_data3;aaa;bbb;ccce;d;e;3
Main_data4;aaaa;bbbb;cc;d;e;4
Main_data5;aaaaa;bbbbb;cccc;d;e;523233
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
Main_data1;a;b;c;dss;e;1
Main_data2;aa;bb;sdc;d;e;2
Main_data3;aaa;bbb;ccce;d;e;3
Main_data4;aaaa;bbbb;cc;d;e;4
Main_data5;aaaaa;bbbbb;cccc;d;e;523233
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ******** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
Main_data1;a;b;c;dss;e;1
Main_data2;aa;bb;sdc;d;e;2
Main_data3;aaa;bbb;ccce;d;e;3
Main_data4;aaaa;bbbb;cc;d;e;4
Main_data5;aaaaa;bbbbb;cccc;d;e;523233
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ******** SOME JUNK DATA ****
**** SOME JUNK DATA ****
**** SOME JUNK DATA ****
Code
import re
fl = open(r"C:\text.txt",'rb')
pattern = r'Main_data.*(?<=;)([0-9]{1,})'
data = []
for line in fl.readlines():
#match all the digits that have ; before and line starts with Main_data
if re.search(pattern, line, re.IGNORECASE | re.MULTILINE):
data.append(re.search(pattern, line, re.IGNORECASE | re.MULTILINE).group(1))
else:
data.append('N')
strng = ','.join(data)#get string of the list
lsts = re.findall(r'(?<=,)[0-9,]+(?=,)',strng)# extracts values and excludes 'N'
outpt = [i.split(',') for i in lsts]# generate final list
print outpt
Output
[['1', '2', '3', '4', '523233'], ['1', '2', '3', '4', '523233'], ['1', '2', '3', '4', '523233']]
Upvotes: 0
Reputation: 107287
You can use a list comprehension to store the numbers :
my_list = [int(line.strip().split(';')[-1]) for line in my_var.split('\n') if line.startswith('Main_data5')]
Also note that as a more pyhtonic way you better to use str.startswith()
method rather than in
operator. (with regards to this poing that it might happen to one line has Main_data5
in the middle of the line!)
If you have two case for start of the line you can use an or
operator with two startswith
consition.
my_list = [int(line.strip().split(';')[-1]) for line in my_var.split('\n') if line.startswith('Main_data5') or line.startswith('Main_data1')]
But if you have more key-words you can use regex.For example if you want to match all the linse that stats with Main_data
and followed by a number you can use re.match()
:
import re
my_list = [int(line.strip().split(';')[-1]) for line in my_var.split('\n') if re.match(r'Main_data\d.*',line)]
Upvotes: 1
Reputation: 2914
my_list = []
for line in my_var.strip().split('\n):
if "Main_data1" in line:
my_list.append(int(line.split(";")[-1]))
else:
continue
Or you can use the startswith('match)' function like someone mentioned.
Upvotes: 0
Reputation: 12077
Check if line starts with "Main_data"
than split by semi-colon ;
and choose the last element by index -1
:
some_list = []
for line in var_a.split("\n"):
if line.startswith("Main_data"):
some_list.append(int(line.split(";")[-1]))
Upvotes: 1