Reputation: 1914
I have this file, it is the result of the MapReduce job so it has key-value
format:
'null\t[0, [[0, 21], [1, 4], [2, 5]]]\n'
'null\t[1, [[0, 3], [1, 1], [2, 2]]]\n'
I want to remove all the character except the second element of this value list:
[[0, 21], [1, 4], [2, 5]]
[[0, 3], [1, 1], [2, 2]]
And finally, add each to a single list:
[[[0, 21], [1, 4], [2, 5]], [[0, 3], [1, 1], [2, 2]]]
This is my attempt so far:
with open(FILENAME) as f:
content = f.readlines()
for line in content:
# Just match all the chars upto "[[" then replace the matched chars with "["
clean_line = re.sub(r'^.*?\[\[', '[', line)
# And remove "\n" and the last 2 "]]" of the string
clean_line = re.sub('[\n]', '', clean_line)[:-2]
corpus.append(clean_line)
Output:
['[0, 21], [1, 4], [2, 5]', '[0, 3], [1, 1], [2, 2]']
You can see it is still str
type, how can I make it to list
type?
Upvotes: 2
Views: 72
Reputation: 43330
Treat it as a line of json and just replace parts of your lines with json documents as needed
import json
corpus = [json.loads(line.replace('null\t', '{"a":').replace("\n", "}"))["a"][1] for line in content]
Upvotes: 4
Reputation: 1261
At the end, you can convert representations of list to List object by using ast
like this:
import ast
sample = ['[0, 21], [1, 4], [2, 5]', '[0, 3], [1, 1], [2, 2]']
result = []
for item in sample:
result.append(list(ast.literal_eval(item)))
And this is the result
containing the desired elements:
[[[0, 21], [1, 4], [2, 5]], [[0, 3], [1, 1], [2, 2]]]
Upvotes: 1