Unless it’s an aesthetic choice, cutscenes in video games tend to pair video with an audio track and in Unity, they have made it easy to incorporate them together.
Creating an Audio Source
However, in order to add the audio to the Timeline, we must first create a game object that can hold the Audio Source.
You can either attach the Audio Source component to an already-created game object in the scene, or create an empty game object and simply attach the Audio Source component to it.
Regardless if you decided to create an empty game object or use an existing one, after you add the Audio Source component to it, just add the Audio Clip to the Audio Source Component and you’re done!
Adding the Audio Source to the Timeline
After you create your timeline, you can add an Audio Track by:
Right click -> Audio Track
Now, just drag the game object that contains the Audio Source you want to the Audio Track you just created in the Timeline.
Once you add the game object to the Audio Track, click on the 3 dots on the far right of the Audio Track and select Add From Audio Clip.
Find the Audio clip you’re looking to add, and you’re done! The audio clip is added to the timeline and will play whenever the cutscene is playing.
Timing is Perfect
After the audio clip is added, you can now modify the cameras to fit the audio clip. If a view requires a longer time period than what you’ve envisioned, you can easily extend the clip by hovering at the end points of a camera and drag it in the opposite direction to extend the view.
Furthermore, the opposite is true; if a camera view needs to be cut short because the audio isn’t as long as you anticipated, you can hover over the endpoints and, instead of dragging it away from the middle, drag the endpoints towards the middle.
Using the Audio Track as Reference
Depending on the version you have, you might be able to play the clip, via Timeline, without actually playing the game itself. This allows you to edit the camera views without the need for Unity to build and generate everything, especially if you’re doing an introduction or exit cutscene.
Moreover, you can zoom in and use the waveforms precisely edit the camera’s start and end points. For instance, if you have some dialog and requires different camera viewpoints during the same dialog, by zooming in and looking at the waveform, you can generally tell almost to the exact point where to stop the first camera viewpoint and start the second one.