Reputation: 257
I want to extract the keywords from following code and store them as separated list items in json.
<meta name="keywords" content="keyword1, keyword2, keyword3">
So far, I was using the following code:
'keywords': [i.split(', ') for i in response.xpath('//meta[@name="keywords"]/@content').extract()]
This will result in a json-file looking like this:
keywords:
0:
0: keyword1
1: keyword2
2: keyword3
Or in raw data like this:
{"keywords": [["keyword1", "keyword2", "keyword3"]]}
But I need them separated as follows:
keywords:
0:
0: keyword1
1:
0: keyword2
2:
0: keyword3
Or put in raw data:
{"keywords": [["keyword1"], ["keyword2"], ["keyword3"]]}
Any ideas how to solve this?*
Upvotes: 1
Views: 541
Reputation: 3717
Try:
>>> from scrapy import Selector
>>> sel = Selector(text="""<meta name="keywords" content="keyword1, keyword2, keyword3">""")
>>> keywords = sel.xpath('//meta[@name="keywords"]/@content').get()
>>> [[i] for i in keywords.split(', ')]
[[u'keyword1'], [u'keyword2'], [u'keyword3']]
Or:
>>> [[[k] for k in i.split(', ')] for i in sel.xpath('//meta[@name="keywords"]/@content').extract()]
[[[u'keyword1'], [u'keyword2'], [u'keyword3']]]
UPD:
Maybe in better to split logics on two cases, like here:
>>> keywords = []
>>> for i in sel.xpath('//meta[@name="keywords"]/@content').extract():
... if ',' in i:
... for k in i.split(','):
... keywords.append([k.strip()])
... else:
... keywords.append([i.strip()])
...
>>> keywords
[[u'keyword1'], [u'keyword2'], [u'keyword3']]
Upvotes: 3
Reputation: 111
Try changing code to,
'keywords': [[x] for x in [i.split(', ') for i in response.xpath('//meta[@name="keywords"]/@content').extract()]]
adding i.split(', ')
within []
will generate individual arrays.
Upvotes: 0