Combine data on single input

Question

I am working on processing some data storage. However, after pre-processing, the data is like this, for example:

-1|news.cnet.com|Technology News - CNET News|-1|-1
-1|news.google.com|Google News|-1|-1
-1|www.bbc.co.uk|BBC News - Home|-1|-1
-1|www.cnn.com|CNN.com|-1|-1
-1|www.news.com.au|News.com.au|-1|-1
1|news.google.com|-1|2|5,156,672
2|www.cnn.com|-1|71|325,362
3|www.news.com.au|-1|569|74,584
4|www.bbc.co.uk|-1|49|442,302
5|news.cnet.com|-1|107|187,705

The format is like INDEX|URL|TITLE|RANK|SLI. The value -1 indicates that the column not having a specific value. There are possible of duplicate entries with the same URL, merging them all will complete the record.

Is there a neat trick and tip for quickly combine these records into one complete? I don't want to iterate and loop repetition for all lines to find the duplicate one and merge.

EDIT: The expecting output is like:

1|news.google.com|Google News|2|5,156,672
2|www.cnn.com|CNN.com|71|325,362
3|www.news.com.au|News.com.au|569|74,584
4|www.bbc.co.uk|BBC News - Home|49|442,302
5|news.cnet.com|Technology News - CNET News|107|187,705

EDIT 2: By using Panda, as root suggested below, I'm able to merge data columns:

from pandas import *

frame = read_csv(r'data.txt', sep='|', names=['index', 'url', 'title', 'rank', 'sli'])
mask = frame['index'].map(lambda x: x > 0)

frame1 = frame[mask].set_index('url')
frame2 = frame[~mask].set_index('url')

frame1.title = frame2.title
frame1.set_index('index')
print frame1

However, is there any quick way around w/o using any third-party libs?

Combine data on single input

Answers (1)

Related Questions