babikar
babikar

Reputation: 611

lists and sublists

i use this code to split a data to make a list with three sublists. to split when there is * or -. but it also reads the the \n\n *.. dont know why? i dont want to read those? can some one tell me what im doing wrong? this is the data

*Quote of the Day -Education is the ability to listen to almost anything without losing your temper or your self-confidence - Robert Frost -Education is what survives when what has been learned has been forgotten - B. F. Skinner *Fact of the Day -Fractals, an important part of chaos theory, are very useful in studying a huge amount of areas. They are present throughout nature, and so can be used to help predict many things in nature. They can also help simulate nature, as in graphics design for movies (animating clouds etc), or predict the actions of nature. -According to a recent survey by Just-Eat, not everyone in The United Kingdom actually knows what the Scottish delicacy, haggis is. Of the 1,623 British people polled:\n\n * 18% of Brits thought haggis was some sort of Scottish animal.\n\n * 15% thought it was a Scottish musical instrument.\n\n * 4% thought it was a character from Harry Potter.\n\n * 41% didn't even know what Scotland's national dish was.\n\nWhile a small number of Scots admitted not knowing what haggis was either, they also discovered that 68% of Scots would like to see Haggis delivered as takeaway. -With the growing concerns involving Facebook and its ever changing privacy settings, a few software developers have now engineered a website that allows users to trawl through the status updates of anyone who does not have the correct privacy settings to prevent it.\n\nNamed Openbook, the ultimate aim of the site is to further expose the problems with Facebook and its privacy settings to the general public, and show people just how easy it is to access this type of information about complete strangers. The site works as a search engine so it is easy to search terms such as 'don't tell anyone' or 'I hate my boss', and searches can also be narrowed down by gender. *Pet of the Day -Scottish Terrier -Land Shark -Hamster -Tse Tse Fly END

i use this code:

contents = open("data.dat").read()
data = contents.split('*') #split the data at the '*'

newlist = [item.split("-") for item in data if item]

to make that wrong similar to what i have to get list

Upvotes: 1

Views: 304

Answers (4)

sunetos
sunetos

Reputation: 3508

The "\n\n" is part of the input data, so it's preserved in python. Just add a strip() to remove it:

finallist = [item.strip() for item in newlist]

See the strip() docs: http://docs.python.org/library/stdtypes.html#str.strip

UPDATED FROM COMMENT:

finallist = [item.replace("\\n", "\n").strip() for item in newlist]

Upvotes: 2

Nas Banov
Nas Banov

Reputation: 29019

The following solves your problem i believe:

result = [  [subitem.replace(r'\n\n', '\n') for subitem in item.split('\n-')]
            for item in open('data.txt').read().split('\n*')  ]

# now let's pretty print the result
for i in result:
    print '***', i[0], '***'
    for j in i[1:]:
        print '\t--', j
    print

Note I split on new-line + * or -, in this way it won't split on dashes inside the text. Also i replace the textual character sequence \ n \ n (r'\n\n') with a new line character '\n'. And the one-liner expression is list comprehension, a way to construct lists in one gulp, without multiple .append() or +

Upvotes: 0

Sean Woods
Sean Woods

Reputation: 2522

This is going to split any asterisk you have in the text as well.

Better implementation would be to do something like:

lines = []

for line in open("data.dat"):
    if line.lstrip.startswith("*"):
        lines.append([line.strip()])  # append a list with your line
    elif line.lstrip.startswith("-"):
        lines[-1].append(line.strip())

For more homework, research what's happening when you use the open() function in this way.

Upvotes: 0

Max
Max

Reputation: 4932

open("data.dat").read() - reads all symbols in file, not only those you want. If you don't need '\n' you can try content.replace("\n",""), or read lines (not whole content), and truncate the last symbol'\n' of each line.

Upvotes: 1

Related Questions