Sebastian
Sebastian

Reputation: 967

How do I parse a block of text into rows?

We're trying to parse a text block into individual rows. It is saved as a text document and our goal is to assign separate blocks of text onto separate rows.

ggplot2 is a data visualization package for the statistical programming language R. Created by Hadley Wickham in 2005, ggplot2 is an implementation of Leland Wilkinson's Grammar of Graphics—a general scheme for data visualization which breaks up graphs into semantic components such as scales and layers. ggplot2 can serve as a replacement for the base graphics in R and contains a number of defaults for web and print display of common scales. Since 2005, ggplot2 has grown in use to become one of the most popular R packages.[1][2] It is licensed under GNU GPL v2.[3]

Source: https://en.wikipedia.org/wiki/Ggplot2

I want to make a table where there is a new row that contains the text following "ggplot."

Row Text    Separator
1   ggplot2 is a data visualization package for the statistical programming language R. Created by Hadley Wickham in 2005,  "ggplot2"
2   ggplot2 is an implementation of Leland Wilkinson's Grammar of Graphics—a general scheme for data visualization which breaks up graphs into semantic components such as scales and layers.   "ggplot2"
3   ggplot2 can serve as a replacement for the base graphics in R and contains a number of defaults for web and print display of common scales. Since 2005,     "ggplot2"
4   ggplot2 has grown in use to become one of the most popular R packages.[1][2] It is licensed under GNU GPL v2.[3]    "ggplot2"

The formatting is off, but the separator column is "ggplot2" for each row.

This is what I tried

text = open('ggplot2.txt','r+')
l=[]
for i in text.readlines():
    if i == "ggplot2":
        l.newline(i)

Upvotes: 0

Views: 105

Answers (2)

Bob White
Bob White

Reputation: 733

AttributeError: 'list' object has no attribute 'newline' remember if you want to add an item to list you need the attribute append.
example:

table.append(item)

I think you should try out.

text = open('ggplot2.txt','r+')
table=[]
for row in text.readlines():
    if "ggplot2" in row:
        data = row.split('ggplot2')
        for index, e in enumerate(data):
            table.append([index, 'ggplot2 {0}'.format(e), 'ggplot2'])

print(table)

list doesnt have an attribute called newline maybe you mean append.

Upvotes: 0

Spencer Wieczorek
Spencer Wieczorek

Reputation: 21575

You can use .append() to create your rows and split by "ggplot2" to get the lines you want:

text = "ggplot2 is a data visualization package for the statistical programming language R. Created by Hadley Wickham in 2005, ggplot2 is an implementation of Leland Wilkinson's Grammar of Graphics—a general scheme for data visualization which breaks up graphs into semantic components such as scales and layers. ggplot2 can serve as a replacement for the base graphics in R and contains a number of defaults for web and print display of common scales. Since 2005, ggplot2 has grown in use to become one of the most popular R packages.[1][2] It is licensed under GNU GPL v2.[3]"

lines = text.split("ggplot2")
rows = []

for line in lines:
  if(line != ""):
    rows.append("ggplot2" + line)

print(rows)

The issue with doing i == "ggplot2" in your code above is that it's checking if the entire line of the parsed text is equal to the string "ggplot2", and not if it contains the string "ggplot2".

Upvotes: 1

Related Questions