XMPPでP2Pの音声チャットセッションを確立する方法2

jingleで使用するXMPP stanza(?)の中に、ってのがある
# stanzaってこういう使い方でいいのかな？

<content creator="initiator" name="voice">  
<description xmlns="urn:xmpp:tmp:jingle:apps:rtp" media="audio"></description></content><payload id="97" name="speex" clockrate="8000"></payload> <payload id="18" name="G729"></payload> <transport xmlns="urn:xmpp:tmp:jingle:transports:ice-udp"><candidate component="1">
foundation='1'  
generation='0'  
ip='192.0.2.3'  
network='1'  
port='45664'  
priority='1678246398'  
protocol='udp'  
pwd='asd88fgpdd777uzjYhagZg'  
type='srflx'  
ufrag='8hhy'/>  
</candidate> </transport><description>

の中身は
XEP-0166: Jingle
では扱ってない。

XEP-0166の中で、

Naturally, more complex scenarios are probable; such scenarios are described in other specifications, such as XEP-0167 for voice chat.

ってのがあるのでポインタの先に飛んでみる

XEP-0167: Jingle RTP Sessions
This specification defines a Jingle application type for negotiating one or more sessions that use the Real-time Transport Protocol (RTP) to exchange media such as voice or video. The application type includes a straightforward mapping to Session Description Protocol (SDP) for interworking with SIP media endpoints.
音声や映像などを交換するためにRTPを使用する、１つ以上のセッションのnegotiationをするためのapplication typeについて定義する。
application typeはSIPと互換性をもてるよう、SDPと１対１対応するマッピングを含む。

application typeがよくわかんないけど、
XEP-0166 Jingle で

wide variety of application types (e.g., voice chat, video chat, file sharing)

って書いてあるからなんとなくP2Pのアプリケーションの種類みたいにとらえればいいのかな。

よく理解してない新登場の言葉
・RTP
・SDP

RTPは Realtime Transport Protocol
RTP: A Transport Protocol for Real-Time Applications

RTP
provides end-to-end network transport functions suitable for
applications transmitting real-time data, such as audio, video or
simulation data, over multicast or unicast network services. RTP
does not address resource reservation and does not guarantee
quality-of-service for real-time services. The data transport is
augmented by a control protocol (RTCP) to allow monitoring of the
data delivery in a manner scalable to large multicast networks, and
to provide minimal control and identification functionality. RTP and
RTCP are designed to be independent of the underlying transport and
network layers. The protocol supports the use of RTP-level
translators and mixers.

音声や映像やsimulation dataといったリアルタイムデータを、multicastやunicastネットワークサービス上で転送するアプリケーション用のtransport機能を提供する。resource reservationやリアルタイムサービスのQOSについては述べない。
データ転送は、control protocol(RTCP)によってデータ転送を監視できる（大規模マルチキャストネットワークにスケールするように）事により強化（augment）される。RTPとRTCPはその下位レイヤからは独立に設計されている。RTPレベルの変換器やmixerをサポートしている。

provides end-to-end delivery services for data with real-time
characteristics, such as interactive audio and video. Those services
include payload type identification, sequence numbering, timestamping
and delivery monitoring.

payloadの定義と、シーケンス番号付けと、時間情報の付加、モニタリング機能。

おいおいRTPのRFC104ページもあって無理。適当に抜粋しよう

While RTP is primarily designed to satisfy the needs of multi-
participant multimedia conferences, it is not limited to that

RTPは多人数参加のマルチメディア会議のために設計された（それに限定されてはいないけれど）。
なるほどだから後ほどのCSRC、SSRCが入ってたりするんですね。
何のために、かがわかるとプロトコルの設計が理解できたりして興味深い。
SIPとXMPPの違いとか、って意味で。

RTP packet: A data packet consisting of the fixed RTP header, a
possibly empty list of contributing sources (see below), and the
payload data. Some underlying protocols may require an
encapsulation of the RTP packet to be defined. Typically one
packet of the underlying protocol contains a single RTP packet,
but several RTP packets MAY be contained if permitted by the
encapsulation method (see Section 11).

Synchronization source (SSRC): The source of a stream of RTP
packets, identified by a 32-bit numeric SSRC identifier carried in
the RTP header so as not to be dependent upon the network address.
All packets from a synchronization source form part of the same
timing and sequence number space, so a receiver groups packets by
synchronization source for playback. Examples of synchronization
sources include the sender of a stream of packets derived from a
signal source such as a microphone or a camera, or an RTP mixer
(see below). A synchronization source may change its data format,
e.g., audio encoding, over time. The SSRC identifier is a
randomly chosen value meant to be globally unique within a
particular RTP session (see Section 8). A participant need not
use the same SSRC identifier for all the RTP sessions in a
multimedia session; the binding of the SSRC identifiers is
provided through RTCP (see Section 6.5.1). If a participant
generates multiple streams in one RTP session, for example from
separate video cameras, each MUST be identified as a different
SSRC.

Contributing source (CSRC): A source of a stream of RTP packets
that has contributed to the combined stream produced by an RTP
mixer (see below). The mixer inserts a list of the SSRC
identifiers of the sources that contributed to the generation of a
particular packet into the RTP header of that packet. This list
is called the CSRC list. An example application is audio
conferencing where a mixer indicates all the talkers whose speech
was combined to produce the outgoing packet, allowing the receiver
to indicate the current talker, even though all the audio packets
contain the same SSRC identifier (that of the mixer).

RTPパケット。固定のRTPヘッダと（空かもしれない）contributing sourcesのリストと、payloadデータ。
SSRCは、マイクやビデオカメラ（やRTPmixer）、といったソースに対して、ユニークでRTPセッション中固定のID。
再生側は、これを識別子として、パケットを、シーケンス番号、タイムスタンプ順にシリアライズして、再生する。
CSRCは、SSRCがRTPmixerだった場合(?)に、mixerがいろんなRTPパケットをがっちゃんこしてる時にでも、
誰がスピーカーなのか識別できるようにつけられる。

RTPのヘッダ
The RTP header has the following format:

0 1 2 3  
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1  
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
|V=2|P|X| CC |M| PT | sequence number |  
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
| timestamp |  
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
| synchronization source (SSRC) identifier |  
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+  
| contributing source (CSRC) identifiers |  
| .... |  
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  

/
* RTP data header
/
typedef struct {
unsigned int version:2; /* protocol version */
unsigned int p:1; /* padding flag */
unsigned int x:1; /* header extension flag */
unsigned int cc:4; /* CSRC count */
unsigned int m:1; /* marker bit */
unsigned int pt:7; /* payload type */
unsigned int seq:16; /* sequence number */
u_int32 ts; /* timestamp */
u_int32 ssrc; /* synchronization source */
u_int32 csrc1; /* optional CSRC list */
} rtp_hdr_t;

A set of default mappings for
audio and video is specified in the companion RFC 3551

payload type については、RFC3551に書いてある。

RTP Profile for Audio and Video Conferences with Minimal Control
Abstract

This document describes a profile called “RTP/AVP” for the use of the
real-time transport protocol (RTP), version 2, and the associated
control protocol, RTCP, within audio and video multiparticipant
conferences with minimal control. It provides interpretations of
generic fields within the RTP specification suitable for audio and
video conferences. In particular, this document defines a set of
default mappings from payload type numbers to encodings.

This document also describes how audio and video data may be carried
within RTP. It defines a set of standard encodings and their names
when used within RTP. The descriptions provide pointers to reference
implementations and the detailed standards. This document is meant
as an aid for implementors of audio, video and other real-time
multimedia applications.

RTPバージョン2とRTCPを使って、多人数参加の音声、映像会議を最低限の管理で行うためのprofile：”RTP/AVP”について記述する。
さらに、音声と映像データがどうRTPの中で転送されるか、説明する。
thanks for the aid!!

RTPパケットの、PTに当たるところの表。

PT encoding media type clock rate channels  
name (Hz)  
\___\___\___\___\___\___\___\___\___\___\___\___\___\___\___\___\___  
PCMU A 8,000 1  
reserved A  
reserved A  
GSM A 8,000 1  
G723 A 8,000 1  
DVI4 A 8,000 1  
DVI4 A 16,000 1  
LPC A 8,000 1  
PCMA A 8,000 1  
G722 A 8,000 1  
L16 A 44,100 2  
L16 A 44,100 1  
QCELP A 8,000 1  
続く。。。  

それぞれの音声コーデックについての概要とRFCへのポインタ、新しいpayload typeをつくろうと思った時にどうするか（という仕様へのポインタ）とか。
speexについては無いので、jingleの仕様に書いてあった96、97がいいのかなー
どこかに書いてありそうなのでそのうち調べる。

次に、SDP
SDP: Session Description Protocol

This document defines the Session Description Protocol, SDP. SDP is
intended for describing multimedia sessions for the purposes of
session announcement, session invitation, and other forms of
multimedia session initiation.

マルチメディアセッションのアナウンス、招待などマルチメディアセッションの確立のために、マルチメディアセッションを記述するためのもの。

SDP includes:
o Session name and purpose
o Time(s) the session is active
o The media comprising the session
o Information to receive those media (addresses, ports, formats and
so on)
As resources necessary to participate in a session may be limited,
some additional information may also be desirable:
o Information about the bandwidth to be used by the conference
o Contact information for the person responsible for the session

SDPは、
・セッション名と目的
・セッションの有効時間
・セッションで使用するメディア
・メディアを受信するための情報（アドレス、ポート、フォーマットなど）
・会議に使用される帯域
・セッションの責任者のコンタクト情報
を含む

spec

Session description
v= (protocol version)
o= (owner/creator and session identifier).
s= (session name)
i= (session information)
u= (URI of description)
e= (email address)
p= (phone number)
c= (connection information – not required if included in all media)
b= (bandwidth information)
One or more time descriptions (see below)
z= (time zone adjustments)
k= (encryption key)
a=* (zero or more session attribute lines)
Zero or more media descriptions (see below)

Time description
t= (time the session is active)
r=* (zero or more repeat times)

Media description
m= (media name and transport address)
i= (media title)
c= (connection information – optional if included at session-level)
b= (bandwidth information)
k= (encryption key)
a=* (zero or more media attribute lines)

例

v=0
o=mhandley 2890844526 2890842807 IN IP4 126.16.64.4
s=SDP Seminar
i=A Seminar on the session description protocol
u=http://www.cs.ucl.ac.uk/staff/M.Handley/sdp.03.ps
e=mjh@isi.edu (Mark Handley)
c=IN IP4 224.2.17.12/127
t=2873397496 2873404696
a=recvonly
m=audio 49170 RTP/AVP 0
m=video 51372 RTP/AVP 31
m=application 32416 udp wb
a=orient:portrait

ｆｍｆｍ
これをxmppの中にマッピングしたのが一番最初のpayloadかな

<payload id="97" name="speex" clockrate="8000"></payload> <payload id="18" name="G729"></payload>

こういうやつ
詳しくは必要になったときでいいでしょう

XEP-0167にも、
After successful transport negotiation (not shown here)
って書いてある＞＜

しかしようやく
10.2 Jingle Audio via RTP, Negotiated with ICE-UDP
というのを発見。
ICE-UDPを次読む。

他にも有用なxmpp stanzaの例がたくさん
10.3 Jingle Audio and Video via RTP, Negotiated with ICE-UDP
10.4 Jingle Audio via SRTP, Negotiated with ICE-UDP
後で戻ってくる。

SRTPというのがあるようだ
The Secure Real-time Transport Protocol (SRTP)
後で読む。

maaash.jp

what I create

XMPPでP2Pの音声チャットセッションを確立する方法2

Comments