spiceitup
spiceitup

Reputation: 146

How to join lines until a string is found which should be a new line.. using python?

I am very new to Python. I want to join lines until a string is found. Once the string is found, it should be a new line and then join rest of the lines in the paragraph.

I have tried joining lines by adding a separator and this works

fileindex = open('index1.txt')
print ";".join(line.strip() for line in fileindex)

I then tried iteration but it only gave me lines that matched the last string:

with open('index1.txt', 'r') as content_file:
  indifile = content_file.read()
  for item in indifile.split("\n"):
      if "Group" in item:
        a = item.strip()
      if "Project" in item:
        b = item.strip()
      if "Manifest" in item:
        c = item.strip()
      if "POM" in item:
        d = item.strip()
      if "Embedded" in item:
        e = item.strip()
        indistrings = [a, b, c, d, e]
        sep = ';'
        print(sep.join(indistrings))

The file looks like this:

Group: ch.qos.logback Name: logback-core Version: 1.1.11 
Manifest Project URL: http://www.qos.ch
Manifest license URL: http://www.eclipse.org/legal/epl-v10.html,
http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html
POM License: Eclipse Public License - v 1.0 http://www.eclipse.org/legal/epl-v10.html
POM License: GNU Lesser General Public License \- http://www.gnu.org/licenses
/old-licenses/lgpl-2.1.html

Group: com.aol.simplereact Name: cyclops-react Version: 2.0.0-FINAL 
POM Project URL: https://github.com/aol/simple-react
POM License: The Apache Software License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt

Group: com.fasterxml Name: classmate Version: 1.3.4 
Project URL: http://github.com/FasterXML/java-classmate
Manifest license URL: http://www.apache.org/licenses/LICENSE-2.0.txt
POM License: The Apache Software License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt
Embedded license files: [classmate-1.3.4.jar/METAINF/LICENSE](classmate-1.3.4.jar/META-INF/LICENSE)

The result i would like should be this:

Group: ch.qos.logback Name: logback-core Version: 1.1.11;Manifest Project URL: http://www.qos.ch;Manifest license URL: Manifest license URL: http://www.eclipse.org/legal/epl-v10.html, http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html;POM License: Eclipse Public License - v 1.0 http://www.eclipse.org/legal/epl-v10.html;POM License: GNU Lesser General Public License \- http://www.gnu.org/licenses
/old-licenses/lgpl-2.1.html

Group: com.aol.simplereact Name: cyclops-react Version: 2.0.0-FINAL;POM Project URL: https://github.com/aol/simple-react;POM License: The Apache Software License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt

and so on

Any help would be much appreciated

Upvotes: 1

Views: 330

Answers (4)

Rakesh
Rakesh

Reputation: 82765

Using a simple iteration.

Ex:

data = """Group: ch.qos.logback Name: logback-core Version: 1.1.11 
Manifest Project URL: http://www.qos.ch
Manifest license URL: http://www.eclipse.org/legal/epl-v10.html,
http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html
POM License: Eclipse Public License - v 1.0 http://www.eclipse.org/legal/epl-v10.html
POM License: GNU Lesser General Public License \- http://www.gnu.org/licenses
/old-licenses/lgpl-2.1.html

Group: com.aol.simplereact Name: cyclops-react Version: 2.0.0-FINAL 
POM Project URL: https://github.com/aol/simple-react
POM License: The Apache Software License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt

Group: com.fasterxml Name: classmate Version: 1.3.4 
Project URL: http://github.com/FasterXML/java-classmate
Manifest license URL: http://www.apache.org/licenses/LICENSE-2.0.txt
POM License: The Apache Software License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt
Embedded license files: [classmate-1.3.4.jar/METAINF/LICENSE](classmate-1.3.4.jar/META-INF/LICENSE)
"""

result = []
for line in data.splitlines():               #Iterate each line
    if not result or not line.strip():       #Check if result is empty or line is empty
        result.append([line.strip() + ";"])        #append empty list
    else:
        result[-1].append(line.strip() + ";")      #append line to previous line

result = ["".join(i).strip().strip(";") for i in result]        #Group lines together. 
print(result)

Output:

['Group: ch.qos.logback Name: logback-core Version: 1.1.11;Manifest Project URL: http://www.qos.ch;Manifest license URL: http://www.eclipse.org/legal/epl-v10.html,;http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html;POM License: Eclipse Public License - v 1.0 http://www.eclipse.org/legal/epl-v10.html;POM License: GNU Lesser General Public License \\- http://www.gnu.org/licenses;/old-licenses/lgpl-2.1.html',
 'Group: com.aol.simplereact Name: cyclops-react Version: 2.0.0-FINAL;POM Project URL: https://github.com/aol/simple-react;POM License: The Apache Software License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt',
 'Group: com.fasterxml Name: classmate Version: 1.3.4;Project URL: http://github.com/FasterXML/java-classmate;Manifest license URL: http://www.apache.org/licenses/LICENSE-2.0.txt;POM License: The Apache Software License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt;Embedded license files: [classmate-1.3.4.jar/METAINF/LICENSE](classmate-1.3.4.jar/META-INF/LICENSE)']

Upvotes: 0

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Iterative approach for Python 3.x:

(with initial separator ;)

with open('input.txt') as f:
    start_line = False          # flag indicating a starting line of a section
    for i, l in enumerate(f):   # iterate with counters (starting from `0`) 
        if not l.strip():       # on encountering empty line
            print(end='\n\n')
            start_line = True   # prepare for next new section
        else:
            print(('' if i == 0 or start_line else ';') + l.strip(), end='')
            start_line = False

The output:

Group: ch.qos.logback Name: logback-core Version: 1.1.11;Manifest Project URL: http://www.qos.ch;Manifest license URL: http://www.eclipse.org/legal/epl-v10.html,;http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html;POM License: Eclipse Public License - v 1.0 http://www.eclipse.org/legal/epl-v10.html;POM License: GNU Lesser General Public License \- http://www.gnu.org/licenses;/old-licenses/lgpl-2.1.html

Group: com.aol.simplereact Name: cyclops-react Version: 2.0.0-FINAL;POM Project URL: https://github.com/aol/simple-react;POM License: The Apache Software License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt

Group: com.fasterxml Name: classmate Version: 1.3.4;Project URL: http://github.com/FasterXML/java-classmate;Manifest license URL: http://www.apache.org/licenses/LICENSE-2.0.txt;POM License: The Apache Software License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt;Embedded license files: [classmate-1.3.4.jar/METAINF/LICENSE](classmate-1.3.4.jar/META-INF/LICENSE)

Upvotes: 1

King Peanut
King Peanut

Reputation: 116

It's not pretty but you can do like this

with open('demo_file.txt', 'r') as f:
    text = ''.join([i.replace('\n', ';') if i.strip() else '\n\n' for i in f.readlines()])

The result look like

Group: ch.qos.logback Name: logback-core Version: 1.1.11 ;Manifest Project URL: http://www.qos.ch;Manifest license URL: http://www.eclipse.org/legal/epl- v10.html,;http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html;POM License: Eclipse Public License - v 1.0 http://www.eclipse.org/legal/epl-v10.html;POM License: GNU Lesser General Public License \- http://www.gnu.org/licenses;/old-licenses/lgpl- 2.1.html;

Group: com.aol.simplereact Name: cyclops-react Version: 2.0.0-FINAL ;POM Project URL: https://github.com/aol/simple-react;POM License: The Apache Software License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt;

Group: com.fasterxml Name: classmate Version: 1.3.4 ;Project URL: http://github.com/FasterXML/java-classmate;Manifest license URL: http://www.apache.org/licenses/LICENSE-2.0.txt;POM License: The Apache Software License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt;Embedded license files: [classmate-1.3.4.jar/METAINF/LICENSE](classmate-1.3.4.jar/META-INF/LICENSE);

Upvotes: 0

Chris Doyle
Chris Doyle

Reputation: 11992

You could just print all lines with '; ' instead of "\n" and only if you see the string "Group" in the line then print two return chars.

mystring = """Group: ch.qos.logback Name: logback-core Version: 1.1.11
Manifest Project URL: http://www.qos.ch
Manifest license URL: http://www.eclipse.org/legal/epl-v10.html,
http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html
POM License: Eclipse Public License - v 1.0 http://www.eclipse.org/legal/epl-v10.html
POM License: GNU Lesser General Public License \- http://www.gnu.org/licenses
/old-licenses/lgpl-2.1.html

Group: com.aol.simplereact Name: cyclops-react Version: 2.0.0-FINAL
POM Project URL: https://github.com/aol/simple-react
POM License: The Apache Software License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt

Group: com.fasterxml Name: classmate Version: 1.3.4
Project URL: http://github.com/FasterXML/java-classmate
Manifest license URL: http://www.apache.org/licenses/LICENSE-2.0.txt
POM License: The Apache Software License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt
Embedded license files: [classmate-1.3.4.jar/METAINF/LICENSE](classmate-1.3.4.jar/META-INF/LICENSE)
"""

for line in mystring.split("\n"):
    if "Group" in line:
        print("\n")
    if line.strip(" "):
        print(line.strip(), end='; ')

This produces the output

Group: ch.qos.logback Name: logback-core Version: 1.1.11; Manifest Project URL: http://www.qos.ch; Manifest license URL: http://www.eclipse.org/legal/epl-v10.html,; http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html; POM License: Eclipse Public License - v 1.0 http://www.eclipse.org/legal/epl-v10.html; POM License: GNU Lesser General Public License \- http://www.gnu.org/licenses; /old-licenses/lgpl-2.1.html; 

Group: com.aol.simplereact Name: cyclops-react Version: 2.0.0-FINAL; POM Project URL: https://github.com/aol/simple-react; POM License: The Apache Software License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt; 

Group: com.fasterxml Name: classmate Version: 1.3.4; Project URL: http://github.com/FasterXML/java-classmate; Manifest license URL: http://www.apache.org/licenses/LICENSE-2.0.txt; POM License: The Apache Software License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt; Embedded license files: [classmate-1.3.4.jar/METAINF/LICENSE](classmate-1.3.4.jar/META-INF/LICENSE); 

Upvotes: 0

Related Questions