Reputation: 55
I have a comma separated series of integer values that I'd like to resample so that I have twice as many, where a new value is added half way between each of the existing values. For example, if this is my source:
1,5,11,9,13,21
the result would be:
1,3,5,8,11,10,9,11,13,17,21
In case that's not clear, I'm trying to add a number between each of the values in my source series, like this:
1 5 11 9 13 21
1 3 5 8 11 10 9 11 13 17 21
I've searched quite a bit and it seems that something like scipy.signal.resample or panda should work, but I'm completely new at this and I haven't been able to get it working. For example, here's one of my attempts with scipy:
import numpy as np
from scipy import signal
InputFileName = "sample.raw"
DATA250 = np.loadtxt(InputFileName, delimiter=',', dtype=int);
print(DATA250)
DATA500 = signal.resample(DATA250, 11)
print(DATA500)
Which outputs:
[ 1 5 11 9 13 21]
[ 1. -0.28829461 6.12324489 10.43251996 10.9108191 9.84503237
8.40293529 10.7641676 18.44182898 21.68506897 12.68267746]
Obviously I'm using signal.resample incorrectly. Is there a way I can do this with signal.resample or panda? Should I be using some other method?
Also, in my example all of source numbers have an integer half way in between. In my actual data, that won't be the case. So if two of the number are 10,15, the new number would be 12.5. However I'd like to have all of the resulting numbers be integers. So the new number that gets inserted would need to either be 12 or 13 (it doesn't matter to me which it is).
Note that once I get this working, the source file will actually be a comma separated list of 2,000 numbers and the output should be 4,000 numbers (or technically 3,999 since there won't be one added to the end). Also, this is going to be used to process something similar to an ECG recording- currently the ECG is sampled at 250 Hz for 8 seconds, which is then passed to a separate process to analyze the recording. However that separate process needs the recording to be sampled at 500 Hz. So the workflow will be that I'll take a 250 Hz recording every 8 seconds and upsample it to 500 Hz, then pass the resulting output to the analysis process.
Thanks for any guidance you can provide.
Upvotes: 3
Views: 7807
Reputation: 3550
Since the interpolation is simple, you can do it by hand:
import numpy as np
a = np.array([1,5,11,9,13,21])
b = np.zeros(2*len(a)-1, dtype=np.uint32)
b[0::2] = a
b[1::2] = (a[:-1] + a[1:]) // 2
You can also use scipy.signal.resample
this way:
import numpy as np
from scipy import signal
a = np.array([1,5,11,9,13,21])
b = signal.resample(a, len(a) * 2)
b_int = b.astype(int)
The trick is to have exactly twice the number of elements, so that odd points match your initial points. Also I think that the Fourier interpolation done by scipy.signal.resample
is better for your ECG signal than the linear interpolation you're asking for.
Upvotes: 4
Reputation: 365815
Although I probably would just use NumPy here, pretty similar to J. Martinot-Lagarde's answer, you don't actually have to.
First, you can read a single row of comma-separated numbers with just the csv
module:
with open(path) as f:
numbers = map(int, next(csv.reader(f))
… or just string operations:
with open(path) as f:
numbers = map(int, next(f).split(','))
And then you can interpolate that easily:
def interpolate(numbers):
last = None
for number in numbers:
if last is not None:
yield (last+number)//2
yield number
last=number
If you want it to be fully general and reusable, just take a function
argument and yield function(last, number)
, and replace None
with sentinel = object()
.
And now, all you need to do is join
the results and write
them:
with open(outpath, 'w') as f:
f.write(','.join(map(str, interpolate(numbers))))
Are there any advantages to this solution? Well, other than the read/split and join/write, it's purely lazy. And we can write lazy split and join functions pretty easily (or just do it manually). So if you ever had to deal with a billion comma-separated numbers instead of a thousand, that's all you'd have to change.
Here's a lazy split
:
def isplit(s, sep):
start = 0
while True:
nextpos = s.find(sep, start)
if nextpos == -1:
yield s[start:]
return
yield s[start:nextpos]
start=nextpos+1
And you can use an mmap
as a lazily-read string (well, bytes
, but our data are pure ASCII, so that's fine):
with open(path, 'rb') as f:
with mmap.mmap(inf.fileno(), 0, access=mmap.ACCESS_READ) as mm:
numbers = map(int, isplit(mm, b','))
And let's use a different solution for lazy writing, just for variety:
def icsvwrite(f, seq, sep=','):
first = next(seq, None)
if not first: return
f.write(first)
for value in seq:
f.write(sep)
f.write(value)
So, putting it all together:
with open(inpath, 'rb') as inf, open(outpath, 'w') as outf:
with mmap.mmap(inf.fileno(), 0, access=mmap.ACCESS_READ) as mm:
numbers = map(int, isplit(mm, b','))
icsvwrite(outf, map(str, interpolate(numbers)))
But, even though I was able to slap this together pretty quickly, and all of the pieces are nicely reusable, I'd still probably use NumPy for your specific problem. You're not going to read a row of a billion numbers. You already have NumPy installed on the only machine that's ever going to run this script. The cost of importing it every 8 seconds (which you can solve by just having the script sleep between runs). So, it's hard to beat an elegant 3-line solution.
Upvotes: 1
Reputation: 12410
Since you suggested a pandas solution, here is one possibility:
import pandas as pd
import numpy as np
l = [1,4,11,9,14,21]
n = len(l)
df = pd.DataFrame(l, columns = ["l"]).reindex(np.linspace(0, n-1, 2*n-1)).interpolate().astype(int)
print(df)
It feels unnecessary complicated, though. I tag in pandas, so people more familiar with pandas functionality see it.
Upvotes: 0