Reputation: 55
I have a txt file which is composed of text and numbers. It looks something like this:
> this is a paragraph which is introductory which lasts
some more lines
text text text
567 45 32 468
974 35 3578 4467
325 765 355 5466
text text text
1 3 6
text text>
What i need is to store the rows which contains 4 number elements.
When i use the read command all elements are read and stored as strings. I'm not sure if i can convert the numbers into digits without filtering them first.
I would appreciate any help. Thanks.
Upvotes: 3
Views: 2649
Reputation: 17322
you can use a regular expression:
import re
result = []
with open('file_name.txt') as fp:
for line in fp.readlines():
if re.search(r'\d{4}', line):
result.append(line.strip())
print(result)
output:
['974 35 3578 4467', '325 765 355 5466']
Upvotes: 0
Reputation: 646
Using regular expression here will be most powerful. We create an pattern using re.compile and then we use search or match method to match the pattern in the string.
import re
p = re.compile(r'[\d]{4}') # \d matches for single digit and {4} will look for 4 continuous occurrences.
file = open('data.txt', 'r') # Opening the file
line_with_digits = []
for line in file: # reading file line by line
if p.search(line): # searching for pattern in line
line_with_digits.append(line.strip()) # if pattern matches adding to list
print(line_with_digits)
The input file for above program is:
text text text
567 45 32 468
974 35 3578 4467
325 765 355 5466
text text text
1 3 6
text text
text 5566 text 45 text
text text 564 text 458 25 text
The output is:
['974 35 3578 4467', '325 765 355 5466', 'text 5566 text 45 text']
Hope this helps.
Upvotes: 0
Reputation: 5202
Use the splitlines() function.
A=open(your file here,'r').read().splitlines()
This will be a list and now you can extract whatever you need. Like:
Req=[]
for i in A:
elem = [s.isnumeric() for s in i.split(' ')]
if len(elem) == 4 and all(elem):
Req.append(i)
Upvotes: 1
Reputation: 36430
For me it sounds like task for re
module. I would do:
import re
with open('yourfile.txt', 'r') as f:
txt = f.read()
lines_w_4_numbers = re.findall(r'^\d+\s\d+\s\d+\s\d+$', txt, re.M)
print(lines_w_4_numbers)
Output:
['567 45 32 468', '974 35 3578 4467', '325 765 355 5466']
Explanation: re.M
flag mean ^
and $
will match start/end of line, \s
denotes whitespace, \d+
denotes 1 or more digits.
Upvotes: 0
Reputation: 435
If you know how to use python regex module you can do that:
import re
if __name__ == '__main__':
with open(TEST_FILE, 'r') as file_1:
for line in file_1.readlines():
if re.match(r'(\d+\s){4}', line):
line = line.strip() # remove \n character
print(line) # just lines with four numbers are printed
The result for you file example is:
567 45 32 468
974 35 3578 4467
325 765 355 5466
Upvotes: 0
Reputation: 758
So you're looking for a substring that contains exactly four integers seperated by space and ended with a newline. You can use regular expressions to locate substrings that follows this pattern. Say you stored the string in the variable s
:
import re
matches = [m[0] for m in re.findall(r"((\d+\s){4})", s)]
The matches
variable now contains the strings with exactly four integers in them. Afterwards you can split each string and convert to integers if you want:
matches = [[int(i) for i in s.split(' ')] for s in matches]
Result:
[[567, 45, 32, 468], [974, 35, 3578, 4467], [325, 765, 355, 5466]]
Upvotes: 0
Reputation: 680
If you can assume that the rows you need will only have 4 numbers then this solution should work:
nums = []
with open('filename.txt') as f:
for line in f:
line = line.split()
if len(line) == 4 and all([c.isdigit() for c in line]):
# use [float(c) for c in line] if needed
nums.append([int(c) for c in line])
print(nums)
Upvotes: 0
Reputation: 51643
Read file by lines, and analyse them. Skip lines with unequal 4 elements and lines that do not consist of 4 space seperated integers:
results = []
with open (filename) as f:
for line in f:
line = line.strip().split()
if len(line) != 4:
continue # line has != 4 elements
try:
numbers = map(int,line)
except ValueError:
continue # line is not all numbers
# do something with line
results.append(line) # or append(list(numbers)) to add the integers
print(*results, sep="\n")
prints:
['567', '45', '32', '468']
['974', '35', '3578', '4467']
['325', '765', '355', '5466']
Upvotes: 0