Reputation: 97
I have a text file containing text like
index of cluster is 18585 points index are [18585, 14290, 18503, 7220, 6835, 10009,6615, 1269, 14161, 26545, 18140, 9292, 20355, 16401, 7713, 582, 1865, 17247, 26256, 19034, 7282, 1847, 19293, 16944, 27748, 29312,.... ]
index of the cluster is 3014 points index are [ ....] and so on ..
I need to extract numbers between "["
until "]"
in every cluster in a single file. i tried to check if line has "["
then get the numbers but didn't work right
import os
f = open("cluster.txt","r")
for line in f.readlines():
if "[" in line:
print("true")
Upvotes: 0
Views: 1078
Reputation: 19
You could alternatively do the following:
f = open("cluster.txt","r")
lst=[]
for line in f.readlines():
lst += list(map(int, line.split("[").[1].split("]")[0].split(",")))
print(lst)
The list will get all lines of your file. The map just serves as transforming the recovered values into integers. You just have to convert the map to a list and append it to the main one.
Upvotes: -1
Reputation: 71
You can do something like this:
f = open("cluster.txt","r")
for line in f.readlines():
numbers_only = line.split('[')[1].split(']')[0]
list_of_number_strings = numbers_only.split(',')
list_of_numbers = [int(number) for number in list_of_number_strings]
With this, you will have the numbers converted to integers in the list_of_numbers
list in the end. First, this splits the line to only get the part between [
and ]
and then it just splits the remainder and converts them to integers. This assumes that each line will contain a list. If some lines would have a different format, you would need to add some additional logic for such cases.
Upvotes: 2
Reputation: 27043
For each line in the file you can use a regular expression to identify the data within the brackets. Then you can split the resulting string and use a list comprehension (or a map as shown here) to give you a list of all the numbers.
For example:
import re
line = '''index of cluster is 18585 points index are [18585, 14290, 18503, 7220, 6835, 10009,6615, 1269, 14161, 26545, 18140, 9292, 20355, 16401, 7713, 582, 1865, 17247, 26256, 19034, 7282, 1847, 19293, 16944, 27748, 29312]'''
a = re.findall('\[(.*?)\]', line)
if a:
nums = list(map(int, a[0].split(',')))
print(nums)
Output:
[18585, 14290, 18503, 7220, 6835, 10009, 6615, 1269, 14161, 26545, 18140, 9292, 20355, 16401, 7713, 582, 1865, 17247, 26256, 19034, 7282, 1847, 19293, 16944, 27748, 29312]
Upvotes: 2