Reputation: 41
I recently started learning programming, and now I'm using python to data filtering. My question is: How do I get a string inside a specific character? for example, in the text file I have something like this:
5d:6g:9h:5t:7a:45;33:12:5B:9J;70;9C;89;85:4B:38:16:9B:45:56:85:
and I want the string inside the 10th character ; ;
, or the 15th : :
I have read the txt file I've gotten some information, but this part in specfic I cound't figure out. Here's what I have so far:
import zipfile
arq = zipfile.ZipFile('DSts.zip')
for file in arq.namelist():
print(file)
f = arq.open(file)
Lines = f.readlines()
for line in Lines:
print(f'{line[11:16]}')
Upvotes: 1
Views: 260
Reputation: 4744
This is a solution that you can integrate into your code. You'd be applying it to every line you read (or every line you think needs to be parsed like this),
def get_substring(input_string, delim, nth, delims):
''' Returns the substring between the nth character
delim in the string and next such character;
delims is a list of all delimiters to account for '''
# Indices of all occurences of delims
idx_delims = [i for i, x in enumerate(input_string) if x in delims]
# Retrieve the index of nth delim
idx_nth = idx_delims[nth-1]
# Find the index of the nth+1 delim
idx_nth_p1 = input_string.index(delim, idx_nth+1)
# Return the substring between those two positions
return input_string[idx_nth+1:idx_nth_p1]
orig_string = '5d:6g:9h:5t:7a:45;33:12:5B:9J;70;9C;89;85:4B:38:16:9B:45:56:85:'
print(orig_string)
# All delimiters
delims = [':', ';']
# Substring between 10th and 11th :
str_1 = get_substring(orig_string, ';', 10, delims)
print(str_1)
# Substring between 15th and 16th ;
str_2 = get_substring(orig_string, ':', 15, delims)
print(str_2)
This function extracts all the characters that are considered delimiters from the input string. Then it finds the Nth delimiter as requested, and the next delimiter in the original string. It returns the string that's in between the two.
In reality, this should have some checking functionality, and relevant warnings, even exception throws (e.g. whether the delim
exists, and if it's at the requested nth
position). Also, it could be written more concisely, I made it longer for readability and understanding. Finally, you should remove the print statements in the final version.
Update: This is minimal code that demonstrates the integration. You can test it out standalone, then use the reading and postprocessing approach instead of open
and readlines
in your original code. There's nothing wrong with either but:
open
clause needs a close
and with open
provides you a close
behind the scenes even if things crash.readlines
reads the whole file at ones. I often work with files that are large, so I'm used to sparing the RAM and processing line by line. It's up to you, and the problem you're working on.So here is the example:
def get_substring(input_string, delim, nth, delims):
''' Returns the substring between the nth character
delim in the string and next such character;
delims is a list of all delimiters to account for '''
# Indices of all occurences of delims
idx_delims = [i for i, x in enumerate(input_string) if x in delims]
# Retrieve the index of nth delim
idx_nth = idx_delims[nth-1]
# Find the index of the nth+1 delim
idx_nth_p1 = input_string.index(delim, idx_nth+1)
# Return the substring between those two positions
return input_string[idx_nth+1:idx_nth_p1]
# All delimiters
delims = [':', ';']
all_substrings = []
with open('testfile.txt', 'r') as fin:
for line in fin:
# Remove the leading and trailing whitespace
line = line.strip()
temp_str = get_substring(line, ':', 2, delims)
all_substrings.append(temp_str)
print(all_substrings)
The code clears the trailing newline with strip()
and it appends all the substrings to a list.
One note: the way you described your problem, it seemed to me like you wanted to match a specific delimiter at a position that is a count of all delimiters i.e. for this 5d:6g:9h:5t:7a:45;33:12:
the delimiter ;
would be the 6th delimiter, so the call turns to (line, ';', 6, delims)
. Let me know if this is not the case, but consider adjusting it yourself for practice. This means the call you mentioned in the comment should be just like here, (line, ':', 2, delims)
. Because :
is the second delimiter. Also keep in mind that Python indexing starts with 0, so this is actually position 1 in the idx_delims
list.
Finally, this is a minimal input file to test with:
5d:6g:9h:5t:7a:45;33:12:5B:9J;70;9C;89;85:4B:38:16:9B:45:56:85:
5d:6g:9h:4t:7a:45;33:12:5B:9J;70;9C;89;85:4B:38:16:9B:45:56:85:
3d:7g:9i:5t:7a:45;33:12:5B:9J;70;9C;89;85:4B:38:16:9B:45:56:85:
Upvotes: 1