Reputation: 181
I have a column that looks like this:
ID len range_cover
0 A0A075B734 347.0 [36, 134, 136, 283]
1 A0A087X1C5 515.0 [22, 328, 347, 514]
2 A0A1B0GTQ1 446.0 [22, 116, 168, 496]
3 A0A1W2PN81 502.0 [22, 46, 48, 117, 119, 149, 152, 160, 162, 230]
4 Q494W8 412.0 [22, 36, 80, 84, 88, 91, 96, 128, 131, 139, 14...
.. ... ... ...
165 Q9UQ90 795.0 [303, 564]
166 Q9Y210 931.0 [0, 930]
And I want to divide the lists in range_cover
by pairs of numbers, but I don't know how to do it.
All the list are dividable by two, so this is possible for all of them.
Here's the expected output:
range_cover
[[36, 134], [136, 283]]
[[22, 328], [347, 514]]
[[22, 116], [168, 496]]
[[22, 46], [48, 117], [119, 149], [152, 160], [162, 230]]
[[22, 36], [80, 84], [88, 91], [96, 128], [131, 139], [14...
...
[303, 564]
[0, 930]
I thought about using zip, something like:
df2['tup'] = df2.apply(lambda x: list(zip(x.range_cover)), axis=1)
But I don't know how to tell the function to 'zip' the first number with the second one, and so on. I also thought to use .replace, but I would need the function to replace a character every 2 numbers.
Any help or advice is welcome! cheers
Upvotes: 0
Views: 94
Reputation: 3720
Via transform()
and list comprehension:
df['range_cover'].transform(lambda x: [x[i:i+2] for i in range(0,len(x),2)])
0 [[36, 134], [136, 283]]
1 [[22, 328], [347, 514]]
2 [[22, 116], [168, 496]]
3 [[22, 46], [48, 117], [119, 149], [152, 160], ...
Upvotes: 1
Reputation: 31146
numpy reshape()
is a simple solution for this
import json
df = pd.read_csv(io.StringIO(""" ID len range_cover
0 A0A075B734 347.0 [36, 134, 136, 283]
1 A0A087X1C5 515.0 [22, 328, 347, 514]
2 A0A1B0GTQ1 446.0 [22, 116, 168, 496]
3 A0A1W2PN81 502.0 [22, 46, 48, 117, 119, 149, 152, 160, 162, 230]"""), sep="\s\s+", engine="python")
df["range_cover"] = df["range_cover"].apply(json.loads)
df["range_cover"].apply(lambda l: np.array(l).reshape(len(l)//2, 2))
Upvotes: 3