Zqdiac Thanks
Zqdiac Thanks

Reputation: 57

Extract text between quotation using regex python

I have tried extracting text inside quotations ""

file content:
"abc"
"ABC. XYZ"
"1 - 2 - 3"

code i've tried using regex

title = re.findall(r'\"(.+?)\"', filecontent)
print(title)

Output:

['abc']
[] # Some lines comes out like this empty
['1 - 2 - 3']

Some of the lines comes empty not sure why. is there an alternative better way to do this?

Upvotes: 2

Views: 2083

Answers (3)

Arvind Kumar Avinash
Arvind Kumar Avinash

Reputation: 79035

If you want to extract some substring out of a string, you can go for re.search.

Demo:

import re

str_list = ['"abc"', '"ABC. XYZ"', '"1 - 2 - 3"']

for str in str_list:
    search_str = re.search('"(.+?)"', str)
    if search_str:
        print(search_str.group(1))

Output:

abc
ABC. XYZ
1 - 2 - 3

Upvotes: 3

Giovanni
Giovanni

Reputation: 93

My solution is:

import re
my_strings = ['SetVariables "a" "b" "c" ', 'd2efw   f "first" +&%#$%"second",vwrfhir, d2e   u"third" dwedew', '"uno"?>P>MNUIHUH~!@#$%^&*()_+=0trewq"due"        "tre"fef    fre f', '       "uno""dos"      "tres"', '"unu""doua""trei"', '      "um"                    "dois"           "tres"                  ']
my_substrings = []
for current_test_string in my_strings:
    for values in re.findall(r'\"(.+?)\"', current_test_string):
        my_substrings.append(values)
        #print("values are:",values,"=")
    print(" my_substrings are:",my_substrings,"=")
    my_substrings = []

Alternate regular expressions to use are:

  • re.findall('"(.+?)"', current_test_string) [Avinash2021] [user17405772021]
  • re.findall('"(.*?)"', current_test_string) [Shelvington2020]
  • re.findall(r'"(.*?)"', current_test_string) [Lundberg2012] [Avinash2021]
  • re.findall(r'"(.+?)"', current_test_string) [Lundberg2012] [Avinash2021]
  • re.findall(r'"["]', current_test_string) [Muthupandi2019]
  • re.findall(r'"([^"]*)"', current_test_string) [Pieters2014]
  • re.findall(r'"(?:(?:(?!(?<!\)").)*)"', current_test_string) # Causes double quotes to remain in the strings, but can be removed via other means. [Booboo2020]
  • re.findall(r'"(.*?)(?<!\)"', current_test_string) [Hassan2014]
  • re.findall('"[^"]*"', current_test_string) # Causes double quotes to remain in the strings, but can be removed via other means. [Martelli2013]
  • re.findall('"([^"]*)"', current_test_string) [jspcal2014]
  • re.findall("'(.*?)'", current_test_string) [akhilmd2016]

The current_test_string.split("\"") approach works if the strings have patterns in which substrings are embedded within quotation marks. This is because it uses the double quotation mark in this example as a delimiter to tokenize the string, and accepts substrings that are not embedded within double quotation marks as valid substring extractions from the string.

References:

Upvotes: 0

I&#39;mahdi
I&#39;mahdi

Reputation: 24049

IIUC, Do you try this?

filecontent = '''
"abc"
"ABC. XYZ"
"1 - 2 - 3"
'''

re.findall(r'\"(.+?)\"', filecontent)

Output:

['abc', 'ABC. XYZ', '1 - 2 - 3']

Upvotes: 2

Related Questions