Reputation: 73
For example, I have 7 files named:
g18_84pp_2A_MVP1_GoodiesT0-HKJ-DFG_MIX-CMVP1_Y1000-MIX.txt
g18_84pp_2A_MVP2_GoodiesT0-HKJ-DFG_MIX-CMVP2_Y1000-MIX.txt
g18_84pp_2A_MVP3_GoodiesT0-HKJ-DFG_MIX-CMVP3_Y1000-MIX.txt
g18_84pp_2A_MVP4_GoodiesT0-HKJ-DFG_MIX-CMVP4_Y1000-MIX.txt
g18_84pp_2A_MVP5_GoodiesT0-HKJ-DFG_MIX-CMVP5_Y1000-MIX.txt
g18_84pp_2A_MVP6_GoodiesT0-HKJ-DFG_MIX-CMVP6_Y1000-MIX.txt
g18_84pp_2A_MVP7_GoodiesT0-HKJ-DFG_MIX-CMVP7_Y1000-MIX.txt
I want to extract a name from these files, named as:
g18_84pp_2A_MVP_GoodiesT0_MIX.txt
Any idea for this? Thanks.
Is there any possible that I can only depend on the underscores?
For example, separating filename as
"g18_84pp_2A_MVP2", "_", "GoodiesT0-HKJ-DFG" "_", "MIX-CMVP2_Y1000-MIX", ".txt".
Take "g18_84pp_2A_MVP2"
without number 2
, take "GoodiesT0"
from "GoodiesT0-HKJ-DFG"
and take first "MIX"
from "MIX-CMVP2_Y1000-MIX"
, B/C I have a lot of files have different names for separating parts, I want it general as well
Upvotes: 0
Views: 70
Reputation: 46573
import re
names = ['g18_84pp_2A_MVP1_GoodiesT0-HKJ-DFG_MIX-CMVP1_Y1000-MIX.txt',
'g18_84pp_2A_MVP2_GoodiesT0-HKJ-DFG_MIX-CMVP2_Y1000-MIX.txt',
'g18_84pp_2A_MVP3_GoodiesT0-HKJ-DFG_MIX-CMVP3_Y1000-MIX.txt',
'g18_84pp_2A_MVP4_GoodiesT0-HKJ-DFG_MIX-CMVP4_Y1000-MIX.txt',
'g18_84pp_2A_MVP5_GoodiesT0-HKJ-DFG_MIX-CMVP5_Y1000-MIX.txt',
'g18_84pp_2A_MVP6_GoodiesT0-HKJ-DFG_MIX-CMVP6_Y1000-MIX.txt',
'g18_84pp_2A_MVP7_GoodiesT0-HKJ-DFG_MIX-CMVP7_Y1000-MIX.txt']
f = lambda x: re.findall('g18_84pp_2A_MVP(.*?)_GoodiesT0(.*?)_MIX(.*?)\.txt', x)
for x in names:
print(f(x))
Produces
[('1', '-HKJ-DFG', '-CMVP1_Y1000-MIX')]
[('2', '-HKJ-DFG', '-CMVP2_Y1000-MIX')]
[('3', '-HKJ-DFG', '-CMVP3_Y1000-MIX')]
[('4', '-HKJ-DFG', '-CMVP4_Y1000-MIX')]
[('5', '-HKJ-DFG', '-CMVP5_Y1000-MIX')]
[('6', '-HKJ-DFG', '-CMVP6_Y1000-MIX')]
[('7', '-HKJ-DFG', '-CMVP7_Y1000-MIX')]
Filter the names that doesn't match this pattern:
names = list(filter(f, names))
Since it's unclear what you're trying to do, this is going to be a good starting point.
UPDATE
The question was updated. Here is what you (probably) want to achieve:
import re
names = ['g18_84pp_2A_MVP1_GoodiesT0-HKJ-DFG_MIX-CMVP1_Y1000-MIX.txt',
'g18_84pp_2A_MVP2_GoodiesT0-HKJ-DFG_MIX-CMVP2_Y1000-MIX.txt',
'g18_84pp_2A_MVP3_GoodiesT0-HKJ-DFG_MIX-CMVP3_Y1000-MIX.txt',
'g18_84pp_2A_MVP4_GoodiesT0-HKJ-DFG_MIX-CMVP4_Y1000-MIX.txt',
'g18_84pp_2A_MVP5_GoodiesT0-HKJ-DFG_MIX-CMVP5_Y1000-MIX.txt',
'g18_84pp_2A_MVP6_GoodiesT0-HKJ-DFG_MIX-CMVP6_Y1000-MIX.txt',
'g18_84pp_2A_MVP7_GoodiesT0-HKJ-DFG_MIX-CMVP7_Y1000-MIX.txt']
expression = 'g18_84pp_2A_MVP(.*?)_Goodies(.*?)_MIX(.*?)\.txt'
f = lambda x: re.findall(expression, x)
_f = lambda x: len(re.findall(expression, x))==3
for x in names:
print(f(x))
Outputs
[('1', 'T0-HKJ-DFG', '-CMVP1_Y1000-MIX')]
[('2', 'T0-HKJ-DFG', '-CMVP2_Y1000-MIX')]
[('3', 'T0-HKJ-DFG', '-CMVP3_Y1000-MIX')]
[('4', 'T0-HKJ-DFG', '-CMVP4_Y1000-MIX')]
[('5', 'T0-HKJ-DFG', '-CMVP5_Y1000-MIX')]
[('6', 'T0-HKJ-DFG', '-CMVP6_Y1000-MIX')]
[('7', 'T0-HKJ-DFG', '-CMVP7_Y1000-MIX')]
If you need to filter the original list:
names = list(filter(_f, names))
Upvotes: 1