An SO User
An SO User

Reputation: 24998

TargetDataLine and Xuggler to record audio with a video of the screen

TargetDataLine is, for me so far, the easiest way to capture microphone input in Java. I want to encode the audio that I capture with a video of the screen [in a screen recorder software] so that the user can create a tutorial, slide case etc.
I use Xuggler to encode the video.
They do have a tutorial on encoding audio with video but they take their audio from a file. In my case, the audio is live.



To encode the video I use com.xuggle.mediaTool.IMediaWriter. The IMediaWriter object allows me to add a video stream and has an
encodeAudio(int streamIndex, short[] samples, long timeStamp, TimeUnit timeUnit)
I can use that if I can get the samples from target data line as short[]. It returns byte[]
So two questions are:

How can I encode the live audio with video?

How do I maintain the proper timing of the audio packets so that they are encoded at the proper time?

References:
1. DavaDoc for TargetDataLine: http://docs.oracle.com/javase/1.4.2/docs/api/javax/sound/sampled/TargetDataLine.html
2. Xuggler Documentation: http://build.xuggle.com/view/Stable/job/xuggler_jdk5_stable/javadoc/java/api/index.html



Update

My code for capturing video

public void run(){
        final IRational FRAME_RATE = IRational.make(frameRate, 1);
        final IMediaWriter writer = ToolFactory.makeWriter(completeFileName);
        writer.addVideoStream(0, 0,FRAME_RATE, recordingArea.width, recordingArea.height);
        long startTime = System.nanoTime();

        while(keepCapturing==true){
            image = bot.createScreenCapture(recordingArea);
            PointerInfo pointerInfo = MouseInfo.getPointerInfo();
            Point globalPosition = pointerInfo.getLocation();

            int relativeX = globalPosition.x - recordingArea.x;
            int relativeY = globalPosition.y - recordingArea.y;

            BufferedImage bgr = convertToType(image,BufferedImage.TYPE_3BYTE_BGR);
            if(cursor!=null){
                bgr.getGraphics().drawImage(((ImageIcon)cursor).getImage(), relativeX,relativeY,null);
            }
            try{
                writer.encodeVideo(0,bgr,System.nanoTime()-startTime,TimeUnit.NANOSECONDS);
            }catch(Exception e){
                writer.close();
                JOptionPane.showMessageDialog(null, 
                        "Recording will stop abruptly because" +
                        "an error has occured", "Error",JOptionPane.ERROR_MESSAGE,null); 
            }

            try{
                sleep(sleepTime);
            }catch(InterruptedException e){
                e.printStackTrace();
            }
        }
        writer.close();

    }

Upvotes: 1

Views: 3215

Answers (1)

Alex I
Alex I

Reputation: 20287

I answered most of that recently under this question: Xuggler encoding and muxing

Code sample:

writer.addVideoStream(videoStreamIndex, 0, videoCodec, width, height);
writer.addAudioStream(audioStreamIndex, 0, audioCodec, channelCount, sampleRate);

while (... have more data ...)
{
    BufferedImage videoFrame = ...;
    long videoFrameTime = ...; // this is the time to display this frame
    writer.encodeVideo(videoStreamIndex, videoFrame, videoFrameTime, DEFAULT_TIME_UNIT);

    short[] audioSamples = ...; // the size of this array should be number of samples * channelCount
    long audioSamplesTime = ...; // this is the time to play back this bit of audio
    writer.encodeAudio(audioStreamIndex, audioSamples, audioSamplesTime, DEFAULT_TIME_UNIT);
}

In the case of TargetDataLine, getMicrosecondPosition() will tell you the time you need for audioSamplesTime. This appears to start from the time the TargetDataLine was opened. You need to figure out how to get a video timestamp referenced to the same clock, which depends on the video device and/or how you capture video. The absolute values do not matter as long as they are both using the same clock. You could subtract the initial value (at start of stream) from both your video and your audio times so that the timestamps match, but that is only a somewhat approximate match (probably close enough in practice).

You need to call encodeVideo and encodeAudio in strictly increasing order of time; you may have to buffer some audio and some video to make sure you can do that. More details here.

Upvotes: 2

Related Questions