therourke
therourke

Reputation: 21

Use Yahoo Pipes to convert RSS html tags to standard tag items

I want to move from using bookmarking service Delicious to Diigo, but the way diigo organise tags in their RSS is preventing the move.

I want to use a Yahoo Pipe to turn Diigo rss tags into the same format as Delicious rss tags

Diigo tags are stored as a html list at the bottom of the 'Description' item, like this:

Some test describing the link.

<p class="diigo-tags"><strong>Tags:</strong>

    <a rel="nofollow" target="_blank" href='https://www.diigo.com/user/username/firsttag'>firsttag</a>

    <a rel="nofollow" target="_blank" href='https://www.diigo.com/user/username/2ndtag'>2ndtag</a>

    <a rel="nofollow" target="_blank" href='https://www.diigo.com/user/username/anothertag'>anothertag</a>

etc... </p>

I need to extract each of these and store them in their own item. Delicious stores each tag in a nested field category by number, like this:

category
  0
   domain http://delicious.com/username/
   content firsttag
  1
   domain http://delicious.com/username/
   content 2ndtag

So, the Yahoo Pipe needs to strip the html list and separate each tag into single category fields.

Not sure where to start, except maybe this regular expression in regex to strip the html:

(?si)<a[^<>]*?[^<>]*>(.*?)</a>

Any advice appreciated.

Upvotes: 1

Views: 230

Answers (1)

janos
janos

Reputation: 124646

You can extract the tags from the diigo stream by performing the following replacements using the Regex operator:

  • replace <a[^<>]*?[^<>]*>(.*?)</a> with $1, using options g and s (the tag itself within the <a>...</a>)
  • replace <.+> with nothing, using options g and m (delete all HTML tags)
  • replace [\s]+ with a single space, using options g and s

As a result, the description field now contains the list of tags separated by spaces. I'm not sure what you need next, if you tell me I can try to help.

Here's the pipe:

https://pipes.yahoo.com/pipes/pipe.info?_id=1656d9fcab9d9ed6016bdae7486ee71f

UPDATE

I see, the tricky part is adding multiple category nodes to an RSS feed. Unfortunately, I don't think that's possible. I updated the pipe, so that now you have item.category.1, .2, .3, and so on, but when you look at the RSS output of the pipe, it doesn't show any categories. (I think this might be related to the fact that the Create RSS operator doesn't have a category field either.)

In the JSON output there are multiple categories correctly.

I also tested that if there is only one category field, it would show up correctly in the RSS output. If there are more than one then no.

And I'm afraid this is as far as I can get you.

Upvotes: 0

Related Questions