libjingleのNAT越えの様子

またlibjingleいじり。
callサンプルはGIPS Media Engineが手に入らないし多分ただで商用アプリに使えなさそうだし、LinphoneもVisualC++環境ではコンパイルできないので、さくっとは動かない！

pcp（ファイル転送）サンプルの動きを見る。

コマンドラインオプションで -d をつけてpcpサンプルを動かすと、UDPホールパンチングの様子がログに流れる。
libjingleはLOG4Jっぽくログレベルを設定したりできていい感じ。

少しログ出力をカスタマイズしながら眺める。
基本的には、流れはこんな感じ。
自分のマシンの全てのローカルIP＋STUNサーバから取得したグローバルIPと、
相手の同じ情報、の全ての組み合わせを、優先度つけながら試していく。
ローカルIP > STUNで得られたグローバルIP ( > リレーサーバのIP )
これがICEって仕組みかな？
Interactive Connectivity Establishment (ICE): A Protocol for Network
Address Translator (NAT) Traversal for Offer/Answer Protocols

[code][J]Conn[0:private-1:local:114.emobile:1116->private-1:local:192.168.174.1:2584|C-w|udp]: sendstun
[J]Conn[0:private-1:local:114.emobile:1116->private-1:local:5.rem_hamachi:2586|C-w|udp]: sendstun
[J]Conn[0:private-1:local:5.loc_hamachi:1119->private-1:local:192.168.0.2:2580|C-w|udp]: sendstun
Error(port.cc:310): [J]Port[private-1:stun:Net[0:114.emobile]]: Received STUN response with bad username2
[J]Conn[0:private-1:local:5.loc_hamachi:1119->private-1:local:192.168.88.1:2582|C-w|udp]: sendstun
[J]Conn[0:private-1:local:5.loc_hamachi:1119->private-1:local:192.168.174.1:2584|C-w|udp]: sendstun
[J]Conn[0:private-1:local:5.loc_hamachi:1119->private-1:local:5.rem_loc_hamachi:2586|C-w|udp]: sendstun
[J]Conn[0:private-1:local:114.emobile:1116->private-1:stun:61.stun_global:2581|C-w|udp]: sendstun
[J]Conn[0:private-1:local:5.loc_hamachi:1119->private-1:stun:61.stun_global:2581|C-w|udp]: sendstun
[J]Conn[0:private-1:stun:114.emobile:1118->private-1:local:192.168.0.2:2580|C-w|udp]: sendstun
[J]Conn[0:private-1:stun:114.emobile:1118->private-1:local:61.stun_global:2580|CRw|udp]: [r]CRw
[J]Conn[0:private-1:local:5.loc_hamachi:1119->private-1:local:61.stun_global:2580|CRw|udp]: [r]CRw
[J]Conn[0:private-1:local:114.emobile:1116->private-1:local:61.stun_global:2580|CRw|udp]: [r]CRw
[J]Port[private-1:local:Net[0:114.emobile]]: got request, send response
[J]Conn[0:private-1:local:5.loc_hamachi:1119->private-1:local:61.stun_global:2580|CRw|udp]: sendstun
[J]Conn[0:private-1:local:114.emobile:1116->private-1:local:61.stun_global:2580|CRw|udp]: sendstun
[J]Conn[0:private-1:stun:114.emobile:1118->private-1:local:192.168.88.1:2582|C-w|udp]: sendstun
[J]Conn[0:private-1:local:114.emobile:1116->private-1:local:61.stun_global:2580|CRw|udp]: got valid STUN response
[J]Conn[0:private-1:local:114.emobile:1116->private-1:local:61.stun_global:2580|CRW|udp]: [w]CRW[/code]
IPの下３桁はemobileとかstun_globalとかに置換してあります。
hamachiもいたりして、失敗してる。

一行の最後の CRW とか C-w とかが、コネクションのステータス
初期化時に、C-ｗなのかな、そっから始まって、だめだったやつは時間おいてリトライとかやってるようだ。

[cpp] const char CONNECT_STATE_ABBREV[2] = {
‘-’, // not connected (false)
‘C’, // connected (true)
};
const char READ_STATE_ABBREV[2] = {
‘R’, // STATE_READABLE
‘-’, // STATE_READ_TIMEOUT
};
const char WRITE_STATE_ABBREV[3] = {
‘W’, // STATE_WRITABLE
‘w’, // STATE_WRITE_CONNECT
‘-’, // STATE_WRITE_TIMEOUT
};[/cpp]

CRW（読み書き成功）になると、そのコネクションを使って、ファイル交換の場合は PseudoTCP とかってクラスとかでUDPに信頼性と順序を加えるためにラップして、httpリクエストをファイル受信側から行う(?)。
信頼性は当面必要ないのでここはスルー。

クロスプラットフォームな音声IO portaudioを試す

週末にもりもり更新されるブログです。
クロスプラットフォームとかほんとめんどくさいけどやってみるよ。

音声のとこもマイクからの入力とか再生とかプラットフォーム依存だよなーと思ってたらportaudioがよいかも。
PortAudio – portable cross-platform Audio API

自分と同じようにlibjingle使ってみてる人がいて
Libjingle for Mac OS X

Support for OS X was written several months ago, by defining a new sound card type in mediastreamer (a third-party framework included in libjingle), making use of the cross-platform PortAudio API to do interaction with the audio hardware. We never tested it, so we assumed it didn’t work. Fortunately, a quick test proved that wrong ! So now we are down to one unsupported platform: Windows. Because PortAudio is cross-platform, it should in theory work on other platforms as well, including Windows.

Windowsではまだ動いてないみたいだけどMacで動いてると。
では、Windowsで動かそう。

Using the PortAudio SVN repository
ソースを入手
“` svn co https://www.portaudio.com/repos/portaudio/trunk .[/code]

VC7.1向けのslnが入ってるので（！）2005にコンバートして開ける。
・コンパイラのincludeに Microsoft Platform SDK/include と Microsoft DirectX SDK (August 2008)/include をいれる
・同様にリンカの方にも入れる
・ASIOはsteinberg開発者登録したのにログインできなくてSDK手に入らないのでプロジェクトから削除

Windows旧来のMMEではその遅延時間(発声してから出力されるまでのタイムラグ)は500から200ミリ秒、Direct Soundでも100から50ミリ秒、Mac OSのSound Managerで50から20ミリ秒とされているが、ASIOの場合はバッファ容量などの設定によって異なるが、10ミリ秒以下、環境によっては1ミリ秒以下となる場合もある

これは魅力的なんだけどなぁ
・リンクが通らないので pa_win_waveformat.c を追加

すればビルドできる。C++のラッパもある。
ｆｍｆｍ

これを使ってlibjingleのMediaEngineをつくろう

gist embed お試し

最新から拾ってきただけ

require 'hotcocoa' include HotCocoa application :name => "Hello World" do |app| window :size => [150, 100] do |win| win.view = layout_view :mode => :vertical do |layout| layout.spacing = 10 lbl = label(:text => "", :layout => {:align => :center}) btn = button(:title => "Say Hi", :layout => {:align => :center}) btn.on_action { lbl.text = "Hello World" } layout << lbl layout << btn end win.will_close { exit } end end

tabmixplusをFF3.0.1,2,3に

やっぱこれ必須だわ
まだ正式対応してなくていろいろ
 http://de-lab.com/article/firefox30-addon/
とか見たけどリンク先はFF3まででFF3.0.1はNGだったので
 http://tmp.garyr.net/dev-builds/tab_mix_plus-0.3.7pre.080920.xpi
これならいける！

第3回オーバレイネットワーク研究会とGnutella調査中

第3回オーバレイネットワーク研究会行ってきました。
http://wslash.com/?p=1928
http://mixi.jp/view_event.pl?id=34638897&comm_id=6936

ISP側からの話がおもしろかったなぁ、考えたことも無かったので。
P2Pは一見ネットワークに優しそうだけれど
物理的なネットワークトポロジーによってはそうでもないよ、ってのが自分的には新しかった。

Server to Client
A —-> B
A —-> C
A —-> D

P2Pで例えばこうやってdeliverしようとしたら、
A —-> B
B -> C
B -> D

という話で、
実は物理的な距離として↓こんなだったら、
A–CD—————–B

P2Pの場合は
B->C,B->Dの分、Server to Clientの方が実は全体的には効率的だよね。って話。
きっとかなりはしょってるんだろうけれど感覚的に理解できました。

そのためのP4Pという構想。
プレゼン後に、クライアント同士がtracerouteうちあえばできるんじゃね？
というのもおもしろそうだった。

あと、IPアドレスとLocation情報の組み合わせってお金で買えるんだけど、
っていうのもなんかうまくやり方がありそうだなぁ。

帰りがけ、「GnutellaもNAT越え実装してるよ、コード参考になるかもしれないね」
ってお名前を失念したどなたかに教えていただいたので、
（NAT越えできてない先入観があったので今さらだけど）
Gnutellaも調べる

http://en.wikipedia.org/wiki/Gnutella

詳しい

peerは以下のいずれかに分類される
leaf node / ultra node

自分のデスクトップ（ADSLルーターの下にいる）はleaf nodeってLimeWireに言われる。
NATの下にいるからだろうね。
P2Pファイルシェアリングアプリ入れるの久々だなぁ、WinMX全盛時代以来だ。

leaf nodeは、3つ程度の ultra peer に接続する。
一方 ultra peer は 32以上の ultra peer に接続する。

検索は、検索したいnodeが検索語のハッシュを Query Routing Protocol で、 Query Routing Table に包んで流す。

自分の持ってる検索対象のハッシュで検索語のハッシュにマッチするものがあれば、検索元に、結果を返す。
むかしは、QRTが流れてきたルートを逆にたどるように戻って行ったけれど、その仕組みはその後更新されて、
今は、UDPを使って検索結果を持っているnodeから検索元nodeに対して直接送信される。
まじ！そうだったんだ。
とりあえずここでNAT越えしてそうだねー

そのため、検索Queryには、検索元のIPとポート番号が含まれている。
Gnutellaのtrafficの削減、Gnutellaネットワークのスケールにかなり役立ったそう。

検索結果のファイルをダウンロードすることになれば、ファイル転送にnegotiationが起こる。ファイルの持ち主がFirewallの下に無ければ、ファイル受信側から直接つなぐ。
（なんかGnutella関係の文書ではFirewallとNATがいっしょくたにFirewallって言葉にまとめられて説明されている気がするなぁ。）

ファイルの持ち主がFirewall下にあったら、受信側はファイルの持ち主に push request を送信する。Gnutella初期には、push request もGnutellaネットワークを経由してルーティングされたが、Gnutellaネットワークは不安定なので、今では push proxies が導入されている。主に ultra peer の場合が多く、その存在は検索結果に含まれて検索元に返される。

ファイル受信側は、push proxy にHTTP Requestを投げ、push proxy はファイルの持ち主に push request を投げる。push proxyを転送することでファイル転送が行われる。

もっと調べたいキーワード
RUDP: Reliable UDP protocol used for NAT-to-NAT transfers; sometimes called Firewall-to-Firewall
これとは別物のようにNat Traversalが書いてあるから謎だなぁ

RUDPはLimeWireのコードの中にもたくさんgrepしたら出てきたのでちょっと読んでみよう。

クロスプラットフォームのローカルアプリ開発はC#（mono）か

「mac対応しないとか今時ねーよ」みたいなことをtypesterが言うので、調べた。

C#がいいかなぁ
What is Mono?

Mono provides the necessary software to develop and run .NET client and server applications on Linux, Solaris, Mac OS X, Windows, and Unix

今年の3月にでたMono 1.9で
# System.Media implemented.
というのもGJ

試す
debian lennyならaptパッケージがある
System.Windows.Forms使うなら
libmono-winforms2.0-cil もお忘れなく。

まずはc#でhelloworld
[csharp]
santrini% cat helloworld.cs
using System;
using System.Windows.Forms;

public class HelloWorld : Form
{
static public void Main ()
{
Application.Run (new HelloWorld ());
}

public HelloWorld ()
{
Text = “Hello Mono World”;
}
}
santrini% gmcs helloworld.cs -r:System.Windows.Forms.dll
santrini% export DISPLAY=localhost:51.0; mono helloworld.exe
[/csharp]
こんな
mono

音の再生はこんな
WindowsのVisual C# 2008 Express EditionのテンプレートからつくったWindowsフォームアプリケーション。
[csharp]santrini% cat Program.cs
using System;
using System.Collections.Generic;
using System.Linq;
using System.Windows.Forms;

namespace sound
{
static class Program
{
///

/// アプリケーションのメインエントリポイントです。
///

[STAThread]
static void Main()
{
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
Application.Run(new Form1());
}
}
}
santrini% cat Form1.cs
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Linq;
using System.Text;
using System.Windows.Forms;

namespace sound
{
public partial class Form1 : Form
{
public Form1()
{
WavPlayer player = new WavPlayer();
//player.play(“/home/mo/tmp/csharp/sound/1.wav”);
player.play(“/home/mo/tmp/csharp/sound/1.mp3″);
}
}
}
santrini% cat WavPlayer.cs
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Media;

namespace sound
{
class WavPlayer
{
public WavPlayer()
{
}

public void play(string wavfile)
{
SoundPlayer player = new SoundPlayer(wavfile);
player.Play();
}
}
}
santrini% gmcs Program.cs Form1.cs WavPlayer.cs -r:System.Windows.Forms.dll
santrini% export DISPLAY=localhost:51.0; mono Program.exe
[/csharp]
これはありだなぁ

XMPPでP2Pの音声チャットセッションを確立する方法4

XMPPでP2Pの音声チャットセッションを確立する方法3
の続き

libjingleでcallサンプル（P2Pの音声電話）を動かしてみて、
ログみてどういうxmpp stanza送りあってるのかなぁ、見てみる。

call.exe に -d オプションをつけて動かすとデバッグ情報がログに出てくる

libjingleではGIPSMediaEngineとlinphoneというのにデフォルト対応していますが
GIPSMediaEngine手に入らないし
linphoneはWindows上でうまくbuildできないので
とりあえず適当なpayload-typeを指定しています。
IPアドレスも適当にしています

XMPPのJIDはこんな感じ
caller：　[email protected]/callB3B38FB3
callee：　[email protected]/call7DA716D7

ところで悲しいことに、、
libjingle Developer Guide

libjingle was created at about the same time as the Jingle XMPP extension (XEP-0166). The libjingle team created their own protocol to handle session negotiation, and later worked with the XMPP Standards Foundation to standardize Jingle; thus, although the libjingle protocol and Jingle are very similar, they are not the same, and are not interoperable.

libjingleは前回までみてきたjingle specとは非互換だしjingleの方は実装してない、さらにlibjingleの独自実装は（たぶん探しても見つからないので）仕様非公開。
いちいち難所がありますな＞＜
very similarに期待して進みます。

セッション開始
非同期にいろいろ送りあってるので、のidのところを見ると、
req/resの組み合わせがわかる

SEND >>>>>>>>>>>>>>>>>>>>>>>>> : Sun Sep 21 17:30:02 2008
[xml]

[/xml]
続けて自分の候補を送る（emobileでグローバルIP持ってる）

SEND >>>>>>>>>>>>>>>>>>>>>>>>> : Sun Sep 21 17:30:03 2008
[xml]

[/xml]

最初のセッション開始に対してackが返ってくる

RECV < <<<<<<<<<<<<<<<<<<<<<<<< : Sun Sep 21 17:30:03 2008
[xml] [/xml]

もう１個候補を送る（こっちはhamachiのIP、オフライン）id=”9″

SEND >>>>>>>>>>>>>>>>>>>>>>>>> : Sun Sep 21 17:30:03 2008
[xml]

[/xml]

id=”8″の候補に対して、transport-acceptが返ってくる

RECV < <<<<<<<<<<<<<<<<<<<<<<<< : Sun Sep 21 17:30:03 2008
[xml]

[/xml]

id=”8″に対してackを返す

SEND >>>>>>>>>>>>>>>>>>>>>>>>> : Sun Sep 21 17:30:03 2008
[xml] [/xml]

こんな感じで続く。。。。

finally…
STUNのところはlibjingleロギングしてくれないのでこれから見ますが、、
つながったら、こんな。

payload-type（カスタムのやつ）を受け入れますよ、ってところでしょうか

RECV < <<<<<<<<<<<<<<<<<<<<<<<< : Sun Sep 21 17:30:13 2008
[xml]

[/xml]

SEND >>>>>>>>>>>>>>>>>>>>>>>>> : Sun Sep 21 17:30:13 2008
[xml] [/xml]

さて、次はSTUNのやり取りをみてみる

XMPPでP2Pの音声チャットセッションを確立する方法3

XMPPでP2Pの音声チャットセッションを確立する方法2
これの続き。

XEP-0176: Jingle ICE-UDP Transport Method
を読む。

This specification defines a Jingle transport method that results in sending media data using raw datagram sockets via the User Datagram Protocol (UDP). This transport method is negotiated via the Interactive Connectivity Establishment (ICE) methodology defined by the IETF and thus provides robust NAT traversal for media traffic.

これと並列なものとして、
XEP-0177: Jingle Raw UDP Transport Method
というのもある

This specification defines a Jingle transport method that results in sending media data using raw datagram sockets via the User Datagram Protocol (UDP). This simple transport method does not provide NAT traversal, and the ICE-UDP transport method should be used if NAT traversal is required.

NAT越えが必要なら前者。後で後者も読む。

The current document defines a transport method for establishing and managing data exchanges between XMPP entities over the User Datagram Protocol (see RFC 768 [2]), using the ICE methodology developed within the IETF and specified in Interactive Connectivity Establishment (ICE) [3] (hereafter referred to as ICE-CORE). Use of the ice-udp method results in a lossy transport suitable for media applications where some packet loss is tolerable (e.g., audio and video).

XMPP entity間でUDP、ICEを使用してデータ転送する方法を定義する。ice-udpはパケットロスに耐えられる音声、映像アプリケーションに向く。

ICEとの違い
・signaling channelにSIPではなく、XMPPを使用する
・ICEでは、全てのtransport候補をいっぺんに送るところ、個別に送ってもいい。XMPPのによるリクエスト/レスポンス機構を利用することで優先度の高いtransportを早く伝えられるので、negotiationが速い。
・SDPの文法はXMPPのxmlにマッピングされている（前回のpayload typeのところかな）
・ICE candidates can be upgraded during a session (e.g., to change an IP address)
IPアドレスの変更に耐えられるってこと？mobilityのサポートってこと？なんだろこれ
・どちらのxmpp entityでも、セッション中にいつでもnegotiationをやり直せる

あー、その前に、ICE-COREの方読まなきゃわかんなそうだ。。

Interactive Connectivity Establishment (ICE): A Protocol for Network
Address Translator (NAT) Traversal for Offer/Answer Protocols
draft-ietf-mmusic-ice-19

119ページ＞＜

Abstract
This document describes a protocol for Network Address Translator
(NAT) traversal for UDP-based multimedia sessions established with
the offer/answer model. This protocol is called Interactive
Connectivity Establishment (ICE). ICE makes use of the Session
Traversal Utilities for NAT (STUN) protocol and its extension,
Traversal Using Relay NAT (TURN). ICE can be used by any protocol
utilizing the offer/answer model, such as the Session Initiation
Protocol (SIP).

offer/answerモデルでUDPを使用したマルチメディアセッションのNAT超えについて記述する。
ICEはSTUNとその拡張であるTURNを使用する。
SIPなど、offer/answerモデルであるあらゆるプロトコルで使用できる。

気になるところ抜粋

2.5. Security for Checks
Each STUN
connectivity check is covered by a message authentication code (MAC)
computed using a key exchanged in the signalling channel. This MAC
provides message integrity and data origin authentication, thus
stopping an attacker from forging or modifying connectivity check
messages.

STUNのSTUN bind request/responseによるPeer同士のconnectivity checkのセキュリティ。
xmppの方で交換した鍵を使って暗号化する

More fundamentally, however, the prioritization
defined by this specification may not yield “optimal” results. As an
example, if the aim is to select low latency media paths, usage of a
relay is a hint that latencies may be higher, but it is nothing more
than a hint. An actual RTT measurement could be made, and it might
demonstrate that a pair with lower priority is actually better than
one with higher priority.

Peer同士の接続候補の選択方法はこの仕様で定められている優先順位が最善とは限らない。
RTTを計測したらリレーサーバを通した方が速いってこともありえるしねー

Example
ってのがある！これだけ読めばいいかも。

STUN Bind RequestのSDP表現はこんな

v=0
o=jdoe 2890844526 2890842807 IN IP4 10.0.1.1
s=
c=IN IP4 192.0.2.3
t=0 0
a=ice-pwd:asd88fgpdd777uzjYhagZg
a=ice-ufrag:8hhY
m=audio 45664 RTP/AVP 0
b=RS:0
b=RR:0
a=rtpmap:0 PCMU/8000
a=candidate:1 1 UDP 2130706431 10.0.1.1 8998 typ host
a=candidate:2 1 UDP 1694498815 192.0.2.3 45664 typ srflx raddr 10.0.1.1 rport 8998

これをXMPPにマッピングしなおすと

ここにあるtransportになりそう

Jingle ICE-UDP Transportに戻る

The transport negotiation process is defined in the Protocol Description section of this document.

よし！

まずは
5.2 Transport Initiation
[xml]

[ … ]
<transport xmlns=‘urn:xmpp:tmp:jingle:transports:ice-udp’

pwd=‘asd88fgpdd777uzjYhagZg’
ufrag=‘8hhy’/>

[/xml]
pwdとufragはICE-COREに仕様がある

The ice-ufrag and ice-pwd attributes MUST be chosen randomly at the beginning of a session.

This attribute is used with Interactive Connectivity Establishment (ICE), and provides the password used to protect STUN connectivity checks.

お互いのクライアントがセッション初期化時に作成する
これはufrag:ユーザー、pwd:パスワードに相当していて、
後でSTUN Bind Request/Responseで使う。

レスポンス
[xml][/xml]

この後ICE transportのnegotiationを行う。
STUN Bind RequestをSTUNサーバにして、自分の候補（IPとポートの組み合わせ）をそろえる。
そろったら、こんなにしてXMPP上で候補を送る。candidateを含むは次々に複数送ってもよい。たｄしpriority順。
[xml]

<transport xmlns=‘urn:xmpp:tmp:jingle:transports:ice-udp’

pwd=‘asd88fgpdd777uzjYhagZg’
ufrag=‘8hhy’>

[/xml]

受け取り手（juliet）はそれぞれのに対して
[xml]
[/xml]
レスポンスを返す。
同時にjulietからも候補をにおさめて返す。

送信者（romeo）がjulietの候補を受け取ったら、
P2Pで接続チェックをする。
STUN Binding Request/Responseを送りあって、複数もってる候補のうち、優先度順にどれがつながるかチェックしていく。
送信者のSTUN Binding RequestにはICE-CONTROLLING属性をつけ、
受信者のにはICE-CONTROLLED属性をつける。

STUN short term credentialsを使用し、セッション初期化時のice-ufrag, ice-pwdを使用してXMPPでXMPPサーバ経由でやってるやり取りとP2Pのやり取りを結びつける。

お互い接続を確立したら受信側から送信側へ、候補を受け入れるよって通知する。
[xml]Example 6. Responder sends candidates

<transport xmlns=‘urn:xmpp:tmp:jingle:transports:ice-udp’

pwd=‘YH75Fviy6338Vbrhrlp8Yh’
ufrag=‘9uB6’>

[/xml]

送信者はack
[xml]
[/xml]

一度成功したnegotiationを変更したり新規に新しいIPやnicができたからって候補を追加することもできるみたい。

とりあえず正常系の雰囲気はつかめた気がする。
次は、libjingle動かしてみるか、自前で小さなテストアプリつくってみるか。

XMPPでP2Pの音声チャットセッションを確立する方法2

これの続き
XMPPでP2Pの音声チャットセッションを確立する方法1

jingleで使用するXMPP stanza(?)の中に、ってのがある
# stanzaってこういう使い方でいいのかな？

<content creator="initiator" name="voice">  
<description xmlns="urn:xmpp:tmp:jingle:apps:rtp" media="audio"></description></content><payload id="97" name="speex" clockrate="8000"></payload> <payload id="18" name="G729"></payload> <transport xmlns="urn:xmpp:tmp:jingle:transports:ice-udp"><candidate component="1">
foundation='1'  
generation='0'  
ip='192.0.2.3'  
network='1'  
port='45664'  
priority='1678246398'  
protocol='udp'  
pwd='asd88fgpdd777uzjYhagZg'  
type='srflx'  
ufrag='8hhy'/>  
</candidate> </transport><description>

の中身は
XEP-0166: Jingle
では扱ってない。

XEP-0166の中で、

Naturally, more complex scenarios are probable; such scenarios are described in other specifications, such as XEP-0167 for voice chat.

ってのがあるのでポインタの先に飛んでみる

XEP-0167: Jingle RTP Sessions
This specification defines a Jingle application type for negotiating one or more sessions that use the Real-time Transport Protocol (RTP) to exchange media such as voice or video. The application type includes a straightforward mapping to Session Description Protocol (SDP) for interworking with SIP media endpoints.
音声や映像などを交換するためにRTPを使用する、１つ以上のセッションのnegotiationをするためのapplication typeについて定義する。
application typeはSIPと互換性をもてるよう、SDPと１対１対応するマッピングを含む。

application typeがよくわかんないけど、
XEP-0166 Jingle で

wide variety of application types (e.g., voice chat, video chat, file sharing)

って書いてあるからなんとなくP2Pのアプリケーションの種類みたいにとらえればいいのかな。

よく理解してない新登場の言葉
・RTP
・SDP

RTPは Realtime Transport Protocol
RTP: A Transport Protocol for Real-Time Applications

RTP
provides end-to-end network transport functions suitable for
applications transmitting real-time data, such as audio, video or
simulation data, over multicast or unicast network services. RTP
does not address resource reservation and does not guarantee
quality-of-service for real-time services. The data transport is
augmented by a control protocol (RTCP) to allow monitoring of the
data delivery in a manner scalable to large multicast networks, and
to provide minimal control and identification functionality. RTP and
RTCP are designed to be independent of the underlying transport and
network layers. The protocol supports the use of RTP-level
translators and mixers.

音声や映像やsimulation dataといったリアルタイムデータを、multicastやunicastネットワークサービス上で転送するアプリケーション用のtransport機能を提供する。resource reservationやリアルタイムサービスのQOSについては述べない。
データ転送は、control protocol(RTCP)によってデータ転送を監視できる（大規模マルチキャストネットワークにスケールするように）事により強化（augment）される。RTPとRTCPはその下位レイヤからは独立に設計されている。RTPレベルの変換器やmixerをサポートしている。

provides end-to-end delivery services for data with real-time
characteristics, such as interactive audio and video. Those services
include payload type identification, sequence numbering, timestamping
and delivery monitoring.

payloadの定義と、シーケンス番号付けと、時間情報の付加、モニタリング機能。

おいおいRTPのRFC104ページもあって無理。適当に抜粋しよう

While RTP is primarily designed to satisfy the needs of multi-
participant multimedia conferences, it is not limited to that

RTPは多人数参加のマルチメディア会議のために設計された（それに限定されてはいないけれど）。
なるほどだから後ほどのCSRC、SSRCが入ってたりするんですね。
何のために、かがわかるとプロトコルの設計が理解できたりして興味深い。
SIPとXMPPの違いとか、って意味で。

RTP packet: A data packet consisting of the fixed RTP header, a
possibly empty list of contributing sources (see below), and the
payload data. Some underlying protocols may require an
encapsulation of the RTP packet to be defined. Typically one
packet of the underlying protocol contains a single RTP packet,
but several RTP packets MAY be contained if permitted by the
encapsulation method (see Section 11).

Synchronization source (SSRC): The source of a stream of RTP
packets, identified by a 32-bit numeric SSRC identifier carried in
the RTP header so as not to be dependent upon the network address.
All packets from a synchronization source form part of the same
timing and sequence number space, so a receiver groups packets by
synchronization source for playback. Examples of synchronization
sources include the sender of a stream of packets derived from a
signal source such as a microphone or a camera, or an RTP mixer
(see below). A synchronization source may change its data format,
e.g., audio encoding, over time. The SSRC identifier is a
randomly chosen value meant to be globally unique within a
particular RTP session (see Section 8). A participant need not
use the same SSRC identifier for all the RTP sessions in a
multimedia session; the binding of the SSRC identifiers is
provided through RTCP (see Section 6.5.1). If a participant
generates multiple streams in one RTP session, for example from
separate video cameras, each MUST be identified as a different
SSRC.

Contributing source (CSRC): A source of a stream of RTP packets
that has contributed to the combined stream produced by an RTP
mixer (see below). The mixer inserts a list of the SSRC
identifiers of the sources that contributed to the generation of a
particular packet into the RTP header of that packet. This list
is called the CSRC list. An example application is audio
conferencing where a mixer indicates all the talkers whose speech
was combined to produce the outgoing packet, allowing the receiver
to indicate the current talker, even though all the audio packets
contain the same SSRC identifier (that of the mixer).

RTPパケット。固定のRTPヘッダと（空かもしれない）contributing sourcesのリストと、payloadデータ。
SSRCは、マイクやビデオカメラ（やRTPmixer）、といったソースに対して、ユニークでRTPセッション中固定のID。
再生側は、これを識別子として、パケットを、シーケンス番号、タイムスタンプ順にシリアライズして、再生する。
CSRCは、SSRCがRTPmixerだった場合(?)に、mixerがいろんなRTPパケットをがっちゃんこしてる時にでも、
誰がスピーカーなのか識別できるようにつけられる。

RTPのヘッダ
The RTP header has the following format:

0 1 2 3  
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1  
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
|V=2|P|X| CC |M| PT | sequence number |  
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
| timestamp |  
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
| synchronization source (SSRC) identifier |  
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+  
| contributing source (CSRC) identifiers |  
| .... |  
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  

/
* RTP data header
/
typedef struct {
unsigned int version:2; /* protocol version */
unsigned int p:1; /* padding flag */
unsigned int x:1; /* header extension flag */
unsigned int cc:4; /* CSRC count */
unsigned int m:1; /* marker bit */
unsigned int pt:7; /* payload type */
unsigned int seq:16; /* sequence number */
u_int32 ts; /* timestamp */
u_int32 ssrc; /* synchronization source */
u_int32 csrc1; /* optional CSRC list */
} rtp_hdr_t;

A set of default mappings for
audio and video is specified in the companion RFC 3551

payload type については、RFC3551に書いてある。

RTP Profile for Audio and Video Conferences with Minimal Control
Abstract

This document describes a profile called “RTP/AVP” for the use of the
real-time transport protocol (RTP), version 2, and the associated
control protocol, RTCP, within audio and video multiparticipant
conferences with minimal control. It provides interpretations of
generic fields within the RTP specification suitable for audio and
video conferences. In particular, this document defines a set of
default mappings from payload type numbers to encodings.

This document also describes how audio and video data may be carried
within RTP. It defines a set of standard encodings and their names
when used within RTP. The descriptions provide pointers to reference
implementations and the detailed standards. This document is meant
as an aid for implementors of audio, video and other real-time
multimedia applications.

RTPバージョン2とRTCPを使って、多人数参加の音声、映像会議を最低限の管理で行うためのprofile：”RTP/AVP”について記述する。
さらに、音声と映像データがどうRTPの中で転送されるか、説明する。
thanks for the aid!!

RTPパケットの、PTに当たるところの表。

PT encoding media type clock rate channels  
name (Hz)  
\___\___\___\___\___\___\___\___\___\___\___\___\___\___\___\___\___  
PCMU A 8,000 1  
reserved A  
reserved A  
GSM A 8,000 1  
G723 A 8,000 1  
DVI4 A 8,000 1  
DVI4 A 16,000 1  
LPC A 8,000 1  
PCMA A 8,000 1  
G722 A 8,000 1  
L16 A 44,100 2  
L16 A 44,100 1  
QCELP A 8,000 1  
続く。。。  

それぞれの音声コーデックについての概要とRFCへのポインタ、新しいpayload typeをつくろうと思った時にどうするか（という仕様へのポインタ）とか。
speexについては無いので、jingleの仕様に書いてあった96、97がいいのかなー
どこかに書いてありそうなのでそのうち調べる。

次に、SDP
SDP: Session Description Protocol

This document defines the Session Description Protocol, SDP. SDP is
intended for describing multimedia sessions for the purposes of
session announcement, session invitation, and other forms of
multimedia session initiation.

マルチメディアセッションのアナウンス、招待などマルチメディアセッションの確立のために、マルチメディアセッションを記述するためのもの。

SDP includes:
o Session name and purpose
o Time(s) the session is active
o The media comprising the session
o Information to receive those media (addresses, ports, formats and
so on)
As resources necessary to participate in a session may be limited,
some additional information may also be desirable:
o Information about the bandwidth to be used by the conference
o Contact information for the person responsible for the session

SDPは、
・セッション名と目的
・セッションの有効時間
・セッションで使用するメディア
・メディアを受信するための情報（アドレス、ポート、フォーマットなど）
・会議に使用される帯域
・セッションの責任者のコンタクト情報
を含む

spec

Session description
v= (protocol version)
o= (owner/creator and session identifier).
s= (session name)
i= (session information)
u= (URI of description)
e= (email address)
p= (phone number)
c= (connection information – not required if included in all media)
b= (bandwidth information)
One or more time descriptions (see below)
z= (time zone adjustments)
k= (encryption key)
a=* (zero or more session attribute lines)
Zero or more media descriptions (see below)

Time description
t= (time the session is active)
r=* (zero or more repeat times)

Media description
m= (media name and transport address)
i= (media title)
c= (connection information – optional if included at session-level)
b= (bandwidth information)
k= (encryption key)
a=* (zero or more media attribute lines)

例

v=0
o=mhandley 2890844526 2890842807 IN IP4 126.16.64.4
s=SDP Seminar
i=A Seminar on the session description protocol
u=http://www.cs.ucl.ac.uk/staff/M.Handley/sdp.03.ps
e=mjh@isi.edu (Mark Handley)
c=IN IP4 224.2.17.12/127
t=2873397496 2873404696
a=recvonly
m=audio 49170 RTP/AVP 0
m=video 51372 RTP/AVP 31
m=application 32416 udp wb
a=orient:portrait

ｆｍｆｍ
これをxmppの中にマッピングしたのが一番最初のpayloadかな

<payload id="97" name="speex" clockrate="8000"></payload> <payload id="18" name="G729"></payload>

こういうやつ
詳しくは必要になったときでいいでしょう

XEP-0167にも、
After successful transport negotiation (not shown here)
って書いてある＞＜

しかしようやく
10.2 Jingle Audio via RTP, Negotiated with ICE-UDP
というのを発見。
ICE-UDPを次読む。

他にも有用なxmpp stanzaの例がたくさん
10.3 Jingle Audio and Video via RTP, Negotiated with ICE-UDP
10.4 Jingle Audio via SRTP, Negotiated with ICE-UDP
後で戻ってくる。

SRTPというのがあるようだ
The Secure Real-time Transport Protocol (SRTP)
後で読む。

XMPPでP2Pの音声チャットセッションを確立する方法1

頭を整理するためのMemo

GoogleTalkで使われているというjingle

XEP-0166: Jingle
XMPP protocol extension for initiating and managing peer-to-peer media sessions between two XMPP entities in a way that is interoperable with existing Internet standards
既存のInternet標準と共存できる方法で、２つのXMPPentities間でP2Pメディアセッションを確立、管理するためのXMPPプロトコルの拡張。

In essence, Jingle enables two XMPP entities (e.g., romeo@montague.lit and juliet@capulet.lit) to set up, manage, and tear down a multimedia session. The negotiation takes place over XMPP, and the media transfer takes place outside of XMPP. The simplest session flow is as follows:
セッションのnegotiationはXMPPで、メディアはXMPP外で。

Initiator Responder  
| |  
| session-initiate |  
|\---\---\---\---\---\---\---\---\---->|  
| ack |  
|< \---\---\---\---\---\---\---\---\----|  
| [transport negotiation] |  
|<\---\---\---\---\---\---\---\---\--->|  
| session-accept |  
|< \---\---\---\---\---\---\---\---\----|  
| ack |  
|\---\---\---\---\---\---\---\---\---->|  
| AUDIO (RTP) |  
|< ===========================>|  
| |  

Example 1. Initiator sends session-initiate

<iq from='[email protected]/orchard'  
id='jingle1'  
to='[email protected]/balcony'  
type='set'>  
<jingle xmlns='urn:xmpp:tmp:jingle'>  
action='session-initiate'  
initiator='[email protected]/orchard'  
sid='a73sjjvkla37jfea'>  
<content creator='initiator' name='voice'>  
<description xmlns='urn:xmpp:tmp:jingle:apps:rtp' media='audio'><payload -type id='96' name='speex' clockrate='16000'/> <payload -type id='97' name='speex' clockrate='8000'/> <payload -type id='18' name='G729'/> <payload -type id='0' name='PCMU' /> <payload -type id='103' name='L16' clockrate='16000' channels='2'/> <payload -type id='98' name='x-ISAC' clockrate='8000'/> </description></p> <transport xmlns='urn:xmpp:tmp:jingle:transports:ice-udp'/> </content>

  
</jingle>  
</iq>[/code]</p> 
Example 2. Responder acknowledges session-initiate

**After successful transport negotiation (not shown here),** the responder accepts the session by sending a session-accept action to the initiator.

Example 3. Responder definitively accepts the session  

<candidate component=‘1’

foundation=‘1’
generation=‘0’
ip=‘192.0.2.3’
network=‘1’
port=‘45664’
priority=‘1678246398’
protocol=‘udp’
pwd=‘asd88fgpdd777uzjYhagZg’
type=‘srflx’
ufrag=‘8hhy’/>

[/code]

ここで
・受信側が使えるpayload-typeに絞り込まれて(?)レスポンス
・IPアドレスとポートが受信側のjulietから送信側のromeoへ行ってる。

IPアドレスとポートはどうやって取得してるんだろ。
この例では192..ってローカルのIPだけどNAT超えの時は、NATの外側のグローバルIPとポートが必要でしょう。

XEP-0166: Jingleでは省略されてる transport negotiation について調べる。

← Older Blog Archives Newer →

maaash.jp

what I create

libjingleのNAT越えの様子

クロスプラットフォームな音声IO portaudioを試す

gist embed お試し

tabmixplusをFF3.0.1,2,3に

第3回オーバレイネットワーク研究会とGnutella調査中

クロスプラットフォームのローカルアプリ開発はC#（mono）か

XMPPでP2Pの音声チャットセッションを確立する方法4

XMPPでP2Pの音声チャットセッションを確立する方法3

XMPPでP2Pの音声チャットセッションを確立する方法2

XMPPでP2Pの音声チャットセッションを確立する方法1