Dan
Dan

Reputation: 257

Split comma separated items into list in scrapy

Issue

I want to extract the keywords from following code and store them as separated list items in json.

<meta name="keywords" content="keyword1, keyword2, keyword3">

So far, I was using the following code:

'keywords': [i.split(', ') for i in response.xpath('//meta[@name="keywords"]/@content').extract()]

Result now

This will result in a json-file looking like this:

keywords:
     0:
        0: keyword1
        1: keyword2
        2: keyword3

Or in raw data like this:

{"keywords": [["keyword1", "keyword2", "keyword3"]]}

Expected Result

But I need them separated as follows:

keywords:
     0:
        0: keyword1
     1:
        0: keyword2
     2:
        0: keyword3

Or put in raw data:

{"keywords": [["keyword1"], ["keyword2"], ["keyword3"]]}

Any ideas how to solve this?*

Upvotes: 1

Views: 541

Answers (2)

vezunchik
vezunchik

Reputation: 3717

Try:

>>> from scrapy import Selector
>>> sel = Selector(text="""<meta name="keywords" content="keyword1, keyword2, keyword3">""")
>>> keywords = sel.xpath('//meta[@name="keywords"]/@content').get()
>>> [[i] for i in keywords.split(', ')]
[[u'keyword1'], [u'keyword2'], [u'keyword3']]

Or:

>>> [[[k] for k in i.split(', ')] for i in sel.xpath('//meta[@name="keywords"]/@content').extract()]
[[[u'keyword1'], [u'keyword2'], [u'keyword3']]]

UPD:

Maybe in better to split logics on two cases, like here:

>>> keywords = []
>>> for i in sel.xpath('//meta[@name="keywords"]/@content').extract():
...     if ',' in i:
...         for k in i.split(','):
...             keywords.append([k.strip()])
...     else:
...         keywords.append([i.strip()])
... 
>>> keywords
[[u'keyword1'], [u'keyword2'], [u'keyword3']]

Upvotes: 3

Pragya
Pragya

Reputation: 111

Try changing code to,

'keywords': [[x] for x in [i.split(', ') for i in response.xpath('//meta[@name="keywords"]/@content').extract()]]

adding i.split(', ') within [] will generate individual arrays.

Upvotes: 0

Related Questions