Category Archives: Edge
Ran into a very interesting issue recently at a customer. Below is the scenario:
OCS 2007 R2
Two pools, each with an associated edge pool.
Associated Edge Pool : EDGEPOOL01
Audio/Video Edge Public Interface: AV1.CONTOSO.COM
Associated Edge Pool: EDGEPOOL02
Audio/Video Edge Public Interface: AV2.CONTOSO.COM
First, the issue: External users homed on POOL02 cannot make/receive calls through the Edge.
We took Wireshark/Network Monitor traces from the external client when the client attempted to make an audio call. While reviewing traces of the call flow, the following error was thrown during the attempted allocate request(To see more details on the expected behavior of this process, check out the Lync Resource Kit Edge Chapter):
The Username Supplied in the request is not known.
The user was sending an allocate request with all required information, Username, Nonce, Realm and Message-Integrity however the A/V Edge Service was rejecting the authentication request stating that the username was unknown.
Next, we reviewed the client UCCAPI log (located at %userprofile%\tracing\Communicator-uccapi-0.uccapilog). When reviewing for the initial SIP INVITE from the user, the candidate list is incomplete. External users must also send Reflexive (Home router public IP Address) and Relay (A/V Edge Interface) IP and port combinations that have been allocated for media.
The initial thought was to attempt a connection to the A/V Edge Public Interface on 443. When users initiate calls they must be able to contact the server on 443 TCP and 3478 UDP to allocate ports. A quick telnet test proved that these connections were open. This proved the theory that the user could not allocate ports with the edge, although it could contact the edge on the proper ports.
m=audio 54614 RTP/AVP 114 111 112 115 116 4 8 0 97 13 118 101
a=candidate:1 1 UDP 2130706431 192.168.1.103 54614 typ host
a=candidate:1 2 UDP 2130705918 192.168.1.103 54604 typ host
a=candidate:4 1 TCP-ACT 1684798463 192.168.1.103 54614 typ srflx raddr 192.168.1.103 rport 54614
a=candidate:4 2 TCP-ACT 1684797950 192.168.1.103 54614 typ srflx raddr 192.168.1.103 rport 54614
a=cryptoscale:1 client AES_CM_128_HMAC_SHA1_80 inline:nMf0n5KQE7L+fajVqoWo+DCMzKj7lHLfwskTMOTt|2^31|1:1
a=crypto:2 AES_CM_128_HMAC_SHA1_80 inline:XykNc+3nFqRWu3l5IJJs/cAFvsUqaL5/ZaVdRhoa|2^31|1:1
The next step was to review the MRAS request during sign on to validate that it was actually receiving valid media relay credentials, and this is where the issue was spotted. To do this, we opened the client UCCAPI log and searched for MRAS(Detailed Information on this process, and tracking these processes can be found in the Lync Resource Kit Edge Chapter): In the MRAS request the client receives a valid 200 OK From the server, with what would be assumed are valid credentials and server information:
<response xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xmlns:xsd=”http://www.w3.org/2001/XMLSchema” requestID=”80980176″ version=”2.0″ serverVersion=”2.0″ to=”sip:EDGEPOOL02.firstname.lastname@example.org;gruu;opaque=srvr:MRAS:k44hfHH-N0O1pJWhN9MnEwAA” from=”sip:USER@contoso.com” reasonPhrase=”OK” xmlns=”http://schemas.microsoft.com/2006/09/sip/mrasp”>
Because the user is associated with POOL02, it should have received AV2.CONTOSO.COM as its public A/V Edge for Media Relay. However, due to a misconfiguration on the edge pool, the MRAS service was handing back the POOL01 A/V Edge Service. Because of this, the user would connect to that edge pool, but when attempting to allocate ports, the edge server had no idea who that user was.
The fix for this issue was to validate the R2 Edge External Interface configuration, we found that AV.CONTOSO.COM was configured as the public DNS name for POOL02, when it should have been AV2. CONTOSO.COM. As soon as this was updated, the issue was resolved.
Below is a reference diagram to help understand the issue.
It has been a while since I have had a post over here, guess you can blame the holiday season as well as the busy beginning of the year at Winxnet. Anyways, I have been working with PSS on an issue with external live meeting through the edge for quite some time now, and with that has been lots of performance monitor collection, after clicking through all of the different collectors multiple times, I decided to create some templates to have for future use and wanted to share them. Its nothing special, no awesome script or anything complicated, but a very basic tool that may be useful to anyone going forward.
Above is a link for access to these files on my sky drive. If you have any issues accessing these please leave a comment or contact me via email and I will get them to you.
They are very easy to use, once downloaded, open Reliability and Performance Monitor from either your Front End or Edge Server…
Expand “Data Collector Sets” and right click on the User Defined folder. Choose New->Data Collector Set
Name the collector set whatever you would like and make sure to choose Create from a template.
Click next to access the next page in the configuration wizard. Choose Browse and locate the XML file you downloaded containing the template information. Once you select that file the page should look like this:
The next two screens will ask you where to save this file, I would suggest a drive with plenty of space as these can get very large depending on the amount of traffic on your server and how long they are running.
When you are ready to collect data simply right click on the set you created and choose start.
Once you are ready to analyze data, or send to Microsoft PSS for data analysis you simply choose stop, and you will have a file in the location you specified. Microsoft PSS uses a tool called PAL(Performance Analysis of Logs) which is an open source application written by a Microsoft employee. This tool can be found here: http://www.codeplex.com/PAL If you are feeling up to performing some of your own analysis this is a great tool to use. I may try to post some more detailed information on using this tool soon.
The templates included in my link include the following Counters:
All <LC: > Counters
Hopefully soon I will have a new post describing the fix for this strange live meeting issue, until then, Enjoy!
On a recent deployment I ran into an issue where everything was working correctly except an external user trying to join or create an Audio Video Conference. The customer had an enterprise edition consolidated configuration behind an F5 Load Balancer. Doing our initial sip traces we were able to see a 500 error when the external user would try to join or create a conference.
Start-Line: SIP/2.0 500 The server encountered an unexpected internal error
ms-diagnostics: 3080;reason="Internal Error: AddUser failed";source="front end server fqdn"
I removed most of the trace except the important parts. What you will see in the above trace is the SIP 500 error, and then at the bottom the AddUser is failing on the front end server. This exact symptom with an enterprise pool behind load balancers points to this KB article: http://support.microsoft.com/kb/946091. This fix explains an issue with the load balancer being in DNAT mode instead of SNAT mode. However our F5 was using SNAT for all of the OCS traffic, and the pool setting was correctly set to not be in DNAT mode.
Running more traces another error popped up which was a SIP 403 Forbidden:
SIP/2.0 403 Forbidden
SERVER: RTCC/22.214.171.124 MRAS/2.0
ms-edge-proxy-message-trust: ms-source-type=EdgeProxyGenerated;ms-ep-fqdn=Edge Internal interfacefqdn;ms-source-verified-user=verified
Ms-diagnostics: 9006;source="Edge Internal interfacefqdn";reason="Forbidden";component="Media Relay Authentication Service"
This basically means that the front end server is not able to get media relay authentication from the edge server A/V internal interface.
If this is happening you will also see an error in the event logs:
Log Name: Office Communications Server
Source: OCS Audio-Video Conferencing Server
Date: 9/25/2009 4:12:14 PM
Event ID: 32018
Task Category: (1017)
Computer: FRONT END SERVER FQDN
The Audio-Video Conferencing Server encountered an error when requesting credentials from the A/V Edge Authentication Service.
A/V Authentication Service Service URI sip:EdgeInternalFQDN@swk.pri;gruu;opaque=srvr:MRAS:HqCEupOMck6C3onsDHul1wAA, Reason: The operation has failed. See the exception’s properties as well as the logs for additional information.
Cause: The Audio-Video Conferencing Server cannot communicate with A/V Authentication Service.
Check the A/V Authentication Service is alive and that network connectivity exists.
Connectivity was available through the internal edge VIP as well as each individual edge server’s internal interface. Also, if you ran an A/V Conferencing Validation on each of the front end servers it would succeed on all tests.
I ran through this with PSS and there were two things we discovered. The first potential issue was on the Internal tab setting of the edge server. Per the Microsoft documentation when doing an enterprise deployment the name that should be listed on the “Internal Servers Authorized to Connect to this edge server” setting is the pool FQDN, not each individual front end server. There has been some debate about whether you should add the FQDN of each front end server to this list as well, because we were seeing the front end servers get denied access to the A/V Authentication service we decided to try it anyways.
The other change that was made was in the forest global settings section. On the general tab you specify your internal SIP domains and you check one for the default routing domain. In this case the customer AD domain was different from the SIP domain, both were listed, however the AD Domain was checked as the domain to be used for the default routing. Once we changed that setting to have the SIP Domain as the default routing domain and restarted the services on the front end servers, A/V conferencing started functioning properly.
I am hoping I can remove each setting and try to narrow it down to one ,but either way the internal interface setting has proved to fix some funky issues in deployments, so both of these may want to be set regardless.