Ticket #4 (closed Bug: Fixed)

Opened 5 years ago

Last modified 2 years ago

more buffering problems

Reported by: electroteque Owned by: joachim
Priority: Major Component: Streaming and Networking
Version: Keywords:
Cc:

Description

hi, i spent quite a while doing some buffering comparisons with FMS2. I was using both a 180k and 300k video, im on a 512k connection, and i was connecting to a dell server in our office in sydney and our server in NY. I ran tests on a 10 and 2 second buffer.

I still feel there is some major buffering problems, and any advice to rectify it would be great, VOD streams better than live streaming aswell which i cant even stream properly yet. let me know thanks.

also is there some kind of proxying / caching that may help here ?

Office Server:

10 Second buffer

red5 with / without buffer configure

108 goes down to 2 then increases again 144 30 sec buffer maintains good buffer 298 sec, lots of movement rebuffers

fms:

100 secs 50 sec buffer 200 secs 50 sec buffer maintains buffer throughout

2 Second Buffer

red5

30 secs 5 sec buffer 75 secs rebuffers 84 rebuffers doesnt retain buffer

fms

30 secs 8 sec buffer 50 secs 15 sec buffer 70 secs degrades down to 10 ramps up again to 20 continues to incrase and decrase between 10 and 20

Remote Server:

10 second buffer

180k

red5

41 sec rebuffer 82 secs rebuffer 186 sec rebuffer

fms

buffers fine, keeps increasing.

300k

red5 with/wihtout buffer configurer

36 rebuffer 69 rebuffer 103 rebuffer

fms:

74 seconds rebuffer 180 peaks at 18 sec buffer 200 10 sec buffer 240 rebuffer 279 rebuffer

2 Second Buffer

red5

2 sec buffer constantly rebuffers

fms

rebuffers 42 secs doesnt retain buffer

180k

red5

10 secs rebuffer 20 secs rebuffer 60 secs 10 sec buffer

fms retains buffer

08/29/06 03:00:10 EST changed by spam@…

Hi, it looks like its some kind of pattern here, to why it keeps rebuffering, someone viewing a 700k video on a 8MB adsl started rebuffering at 47 secs, my connection is a 512k and it too started at 47 seconds.

09/05/06 19:18:19 EST changed by spam@…

Hi it seems that red5 cant keep a buffer for liv webcam streams at all, it can only manage to do VOD ok, the framerate can not get over 2fps and it doesnt hold a buffer of 20 seconds. FMS seems to be able to hold the buffer quite strongly keeping a framerate of 8fps i did however get a rebuffer 170 seconds in. This is set to a 130k stream all up, i have a fast broadband connection so it should be fine though. Is this also affected by a lag in the connection with the server ?

We tried a user on a comcast cable connection in the states and obviouslly no rebuffer problems to our server in NY however trying to test from australia is not so good.

09/11/06 03:42:27 EST changed by spam@…

Hi it seems ive stablised our cam frame dropping and buffering problem by using dev licence of FMS for the moment. However we still have a problem with the VOD streaming. Why is it that progressive downloading on the same server as red5 via IIS would work better than streaming a video via red5 ? What are the benefits mostly trying to use red5 seeing that downloading the files actually play and run better ? None of it works atm, ive even taken off the bandwidth limit options.

Attachments

SampleVideoChatApplication_Flex.2.zip Download (227.5 KB) - added by vicnov 3 years ago.

Change History

Changed 5 years ago by luke

I think we have an issue with latency having a bad negative effect on throughput. I have noticed this myself with a demo running at 1mbit, sever in states, people in states can view without buffering, people in uk, it buffers every 10s for about 10s. Progressive is fine for people in uk. So we used server in uk, and its fine. Difference in latency is 20 ( works ) - 120 ( doesnt work ).

The cause of this is unknown, but likely to be low level network issue, possibly something about the way mina is working for us. Before a solution can be found I'm going to do some research into streaming with mina and write a simple test server and client that we test with.


Changed 5 years ago by electroteque

Ok the problem resolves itself when we move to progressive download obviouslly, it works without a problem i guess the client is controlling the throughput rather than the server when downloading. Maybe its a problem with how the client streams ? Unfortunately you cant change the buffer on the client on the fly, its obviouslly downloading it all before the playback can keep up ?

Changed 5 years ago by skipidar

I got the same problems. Trying to create live video chat with a little delay, but cant. Delay is growing in time with the buffer... I hope that you'll find the solution, because FMS is working fine.. Delay is very little and stable. I tried different connections, even using 100 Mbps you can get a growing buffer

Changed 5 years ago by electroteque

Hi there, im revisiting the buffering problem as its definitely something to do with Red5 and dropping packets over longer routes however i will show u the traceroute to the server in question. Im colating what i sent to the list.

Click the video window the status window will appear and click to
remove. You will find the buffer length in FMS stays constant it picks
up while its playing. So it seems while its playing it constantly keeps
the 8 second buffer, and starts immediately , it seems to stop before
the duration though !

 http://69.42.91.84:5080/FMS.swf

Now check the red5 copy of the exact same code, it prebuffers unlike FMS
, and the buffer length dies after the allowed buffer time,. it doesnt
seem to pick up while playing, so plays buffers plays buffers etc,
unlike FMS which plays and during playing its constantly adding to the
buffer. Something isnt right pushing the data rate out, its not a route
issue as i would have thought it could still be a network utilisation
thing but its a network thing so way over my head.

 http://69.42.91.84:5080/RED5.swf

updated red5 trunk on my locallly hosted ubuntu server
stored in a data centre in Sydney. The buffer length reaches up to 50 or
more ! It never missed a beat. Maybe there is a network utilisation
issue with red5, or is it the server route to the servers in NY ?

 http://69.42.91.84:5080/RED5_AU.swf

We've put scrub control in now so cant go back to progressive
downloading unless there is a way to scrub maybe start playing at the
set position rather than seek ? Maybe thats how google video does it ?

It seems since making a few of the players live cpu has picked up to 20%
and java is using 500MB of ram, ive set a min of 500 and max of 1000 on
the memory, should i be extending this ? What makes it want to use up so
much memory and stay there, when i reboot it goes back to 75MB then
picks up from there, something is filling it and not releasing.

 http://69.42.91.84:5080/PROG.swf

Progressive Download example very stable, not a route problem, scrub
works but it recognises its prog download and seeks 10 seconds ahead,
anything more than 10 seconds it complains that its a bad seek point.

We are streaming wmv from the same network in NY and it will buffer once
or twice and thats it.

Im experiencing odd messages from the red5 server in NY, im guessing
maybe this is the problem

Here is the messages on the local box

[INFO] 148045 SocketAcceptorIoProcessor-0.1:(
org.red5.demos.fitc.Application.info ) Client connected 1 conn
RTMPMinaConnection from 60.241.190.74:50360 to 202.4.232.4 (in: 3483,
out: 3073)
[INFO] 148046 SocketAcceptorIoProcessor-0.1:(
org.red5.demos.fitc.Application.info ) Setting stream id: 0
[INFO] 148046 SocketAcceptorIoProcessor-0.1:(
org.red5.demos.fitc.Application.info ) Client joined app 1
[INFO] 148082 SocketAcceptorIoProcessor-0.1:(
org.red5.demos.fitc.Application.info ) Received result {level=error,
code=NetConnection.Call.Failed} for setId
[WARN] 148088 SocketAcceptorIoProcessor-0.1:(
org.red5.server.net.rtmp.RTMPHandler.warn ) Unhandled ping: Ping: 3, 0,
8000, -1
00 03 00 00 00 00 00 00 1F 40
[INFO] 228040 SocketAcceptorIoProcessor-0.1:(
org.red5.server.stream.PlaylistSubscriberStream.info ) Scheduled stop
in: 68946
[INFO] 297014 DefaultQuartzScheduler_Worker-5:(
org.red5.server.stream.PlaylistSubscriberStream.info ) in stop
[INFO] 297158 DefaultQuartzScheduler_Worker-5:(
org.red5.server.stream.PlaylistSubscriberStream.info ) Stop

Here is the messages on the windows box, it seems like alot of these
socketacceptor messages appear, in fact alot of messages appear !

It seems also the ip address mapped for this server was setup wrong
which could be the issue it actually has two ips mapped to it via the
data centre, red5 seems to be listening on 85
.92.159.20, if i connect to this ip its really slow, if i connect to the
other ip its a little better,

tionAdapter.debug ) appConnect: RTMPMinaConnection from
74.166.228.2:50626 to 85
.92.159.20 (in: 3526, out: 3073)

[DEBUG] 60688188 SocketAcceptorIoProcessor-0.2:(
org.red5.server.Scope.debug ) a
dding client
[DEBUG] 60688188 SocketAcceptorIoProcessor-0.2:(
org.red5.server.Scope.debug ) r
eturning context
[DEBUG] 60688188 SocketAcceptorIoProcessor-0.2:(
org.red5.server.Scope.debug ) r
eturning context
[DEBUG] 60688188 SocketAcceptorIoProcessor-0.2:(
org.red5.server.Scope.debug ) r
eturning context
[DEBUG] 60688266 SocketAcceptorIoProcessor-0.2:(
org.red5.server.Scope.debug ) r
eturning context
[DEBUG] 60688407 SocketAcceptorIoProcessor-0.2:(
org.red5.server.Scope.debug ) r
eturning context
[DEBUG] 60688407 SocketAcceptorIoProcessor-0.2:(
org.red5.server.Scope.debug ) r
eturning context
[DEBUG] 60688407 SocketAcceptorIoProcessor-0.2:(
org.red5.server.Scope.debug ) r
eturning context
[DEBUG] 60688407 SocketAcceptorIoProcessor-0.2:(
org.red5.server.Scope.debug ) r
eturning context
[DEBUG] 60688407 SocketAcceptorIoProcessor-0.2:(
org.red5.server.Scope.debug ) r
eturning context
[DEBUG] 60688407 SocketAcceptorIoProcessor-0.2:(
org.red5.server.Scope.debug ) r
eturning context
[DEBUG] 60689625 SocketAcceptorIoProcessor-0.1:(
org.red5.server.Scope.debug ) r
eturning context

Another thing ive noticed still is that if i set it to 8 seconds buffer,
it will start playing at 6 or 8 seconds not 0.

Here is a response regarding the test videos, it seems that the further you are away from a server the worse it is, to maybe the streaming from the server in NY is fine for people in europe and the US not sure. But streaming from australia is terrible getting back to before it streamed perfect but streaming from the local geo-located machine about 30KM away in Sydney.

i tryed all three links.
FMS and RED5 worked preaty well in the same way. it got stuck just once.

RED5 AU stuck all the time. it buffered, played a few seconds, buffered
again, and so on.

iam located in germany.

Here is the traceroute, im not a network guru so this stuff is way over my head

iElectro:~ electroteque$ traceroute 69.42.91.84
traceroute to 69.42.91.84 (69.42.91.84), 64 hops max, 40 byte packets

1 home.gateway (192.168.1.254) 1.753 ms 0.543 ms 0.451 ms
2 10.20.20.168 (10.20.20.168) 25.358 ms 26.999 ms 25.749 ms
3 202.7.166.209 (202.7.166.209) 26.468 ms 27.467 ms 36.526 ms
4 syd-pow-ibo-zeu-1-ge-2-2.tpgi.com.au (202.7.166.199) 26.240 ms 25.455 ms 27.245 ms
5 202.92.127.85 (202.92.127.85) 27.452 ms 26.840 ms 35.866 ms
6 syd-core-p-01-ge102.powertel.net.au (202.92.64.138) 27.574 ms 27.414 ms 27.342 ms
7 unknown.net.reach.com (134.159.126.133) 27.289 ms 27.411 ms 26.997 ms
8 i-5-0.syd-core04.net.reach.com (202.84.144.250) 28.264 ms 26.517 ms 28.019 ms
9 i-4-0.paix-core01.net.reach.com (202.84.144.154) 215.625 ms 185.991 ms 184.678 ms

10 g3_7-pax06.net.reach.com (202.84.251.86) 343.903 ms 185.925 ms 207.515 ms
11 gblx.peer.paix05.net.reach.com (134.159.62.98) 185.477 ms 185.333 ms 192.782 ms
12 te1-2-10g.ar4.nyc1.gblx.net (67.17.75.102) 262.678 ms 260.825 ms 379.762 ms
13 webair-internet.g6-4.ar4.nyc1.gblx.net (64.215.187.74) 293.968 ms 262.984 ms 267.949 ms
14 esa080.nyc.webair.net (209.200.31.85) 276.170 ms 260.101 ms 262.573 ms
15 csa080.nyc.webair.net (209.200.31.10) 272.708 ms 267.185 ms 267.381 ms
16 red5 (69.42.91.84) 268.361 ms 268.505 ms 268.612 ms

Changed 5 years ago by electroteque

Hi , ive yet to confirm with FMS, but testing red5 locally ive discovered the buffer length value is the value of the duration, and on playback it decreases after time. IE after 1 second of playback the buffer length decreases 1 second.

I think this is the issue, as im discovering over a long route to the server it decreases on playback instead of increases. Is this the expected output ?

Changed 5 years ago by electroteque

Ok confirming that with FMS, the buffer length starts at 16 and then after 8 seconds when it reaches a buffer length of 9 it will jump back to 16. Im using an 8 second buffer.

If i change this to a 0.1 second buffer which is the default time if u dont set a time, it stays at 2 seconds and may jump down to 1 second.

As for red5

for a 0.1 and 8 second buffer it starts at 40 , then jumps to 95, then 130 then 140 and then starts dropping from there going down to 0 as its playing back.

Im fairly confident due to the discrepancy how it handles buffers this can possibly be the problem. So in my case at an 8 second buffer , it will buffer for 8 seconds , then on playback buffer for another 8 seconds, starts playback at 8 seconds, then buffer length counts down to 0 on playback, buffers for another 8 seconds counts down to 0 again.

As a matter of fact for some odd reason checking it again FMS seems to be dropping out aswell now :)

 http://69.42.91.84:5080/FMS.swf

Ill have ot chat with the network guy again.

Changed 5 years ago by electroteque

Ive updated some links to one of our slave servers running windows media currently. Some interesting results, FMS on the other server was only keeping a buffer length of 6, this current FMS example on the other server holds a buffer length of 20. Red5 still goes from 8 which is the buffer length down to 0. So it seems when the buffer length reaches the buffer time in FMS it pushes something down the network to boost it to twice the bufferTime to keep it holding up ?

 http://69.42.91.84:5080/FMS_WWW1.swf

 http://69.42.91.84:5080/RED5_WWW1.swf

the previous examples were

 http://69.42.91.84:5080/RED5.swf

 http://69.42.91.84:5080/FMS.swf

the fms example here wont work just yet because i just tried to reinstall FMS to double check it wasnt an installation problem and it requires a machine reboot ! Maybe this thing is best on linux which is more preferable than windoze though all our unix boxes are freebsd.

Changed 5 years ago by electroteque

Yet another example, this time dev licence of wowza you can see what it does by comparing the buffer lengths.

Yet another comparison, ive managed to get wowza going which is loading files on the same shared network path red5 is, same server. Wowza keeps a buffer length of 10 but doesnt seem to grow like FMS either ? It just back and forth between 10 and 11.

red5 is the same goes from 8 to 0. I think our same latency problem is with live publishing aswell. It takes much longer to start playing, and it does that jumping thing at the start and rebuffers.

 http://69.42.91.84:5080/WOWZA_WWW2.swf

 http://69.42.91.84:5080/RED5_WWW2.swf

Changed 5 years ago by electroteque

Interesting, on a live stream scenario, wowza rebuffered but red5 was ok
keeping a buffer length between 10-15. The biggest problem faced so far
is that we set the frames per second to be 8fps but we can only manage
to get 1fps out of it.

Wowza was also playing back the archive not live stream lol.

The other issue im facing that the buffer length is much worse when
archiving than when just in live mode, there is definitely problems it
disconnects and causes exceptions.

There is a big different in how red5 handles buffering for live and vod
playback thats for sure :\

Yes there is a slight difference in buffering when archiving and just in
live publish mode. The frame rate is always 1 fps so its sticky, no
setting seems to want to change it.

Changed 5 years ago by joachim

Buffering should be fine in r1822. I tested with the Spiderman trailer which has ~496 kbit/s through a 500 kbit/s connection with a buffer time of 10 seconds. Throughout the whole video, the buffer on the client was between 7 and 11 seconds and no Buffer.Empty events occured.
Please report either here or on the list how the trunk performs in your setup.

Changed 5 years ago by electroteque

Hi this isnt fixed at all the problem is actually worse, the stream will now freeze without any errors returned

 http://69.42.91.84:5080/RED5_WWW2.swf

Im going to try and upload the jar files again and try again.

Changed 5 years ago by electroteque

It seems if i set the buggerting anything more than 2 seconds, it will stall after it has finished buffering. It still buffers, plays, for 2 seconds buffers again and tries to start playing at buffer time not 0.

Changed 5 years ago by electroteque

It i set pause on buffer empty event and then resume on buffer full event it will start playing back at one, it seems like its trying to play while its buffering causing it to be 8 seconds when it starts to try and play on startup. Doing this will may the playback freeze at 11 seconds not 7.

Changed 5 years ago by electroteque

I thought it was because of the flex app here is what happens in a flah 8 projector

NS.onStatus> info.code: NetStream.Play.Reset
NS.onStatus> info.code: NetStream.Play.Start
NS.onStatus> info.code: NetStream.Buffer.Full
NC.onStatus> info.code: NetConnection.Connect.Closed

at around 7 seconds.

Changed 5 years ago by electroteque

I reverted to the same jar files as with our live windows machine , unsure which version it is, but started playing fine something is really broken between this trunk version and the very latest.

Changed 5 years ago by electroteque

Apologies, i just tried my flex player in a standalone flash and it plays right away, this seem to be some wierd thing debugging in flex maybe on the pc which is making it buffer and play that way ?

Its still dropping the buffer length after each rebuffer so every 8 seconds it rebuffers. Its not fixed, its not a route issue to the server either because windows media streaming is fine, and as above tested on Wowza and FMS and seems ok, although wowza live streaming was terrible.

Changed 5 years ago by joachim

Should be fixed with Lukes latest fixes in r1860.

Changed 5 years ago by vicnov

I reproduced the same or very similar problem in a recent Red5 trunk, rev 2225.
First, I will describe the problem briefly and then prepare a detailed bug report and post my test application here.

Ok, I can NOT reproduce any buffering problems with Red5's "videoConference" sample (taken from SVN),
but I CAN see a problem with my Flex-based sample. So the difference is that my sample uses Flex at client-side, but Red5's "videoConference" uses pure Flash.

Briefly, the problem is:
if we have one very slow video consumer (it receives data slowly from Red5) and this consumer is a Flex app, then ALL OTHER consumers (of the same video provider) are also working slowly - Red5 does not send data to them.

Ok, wait a little - I will post the detailed report and a proposed workaround which we have found today.

Changed 5 years ago by vicnov

Let me describe my test example.

We have a "video chat" application where multiple users can watch somebody's video broadcast.
This is a LIVE chat, I mean live, real-time broadcast from one user (provider) to many users (consumers).

How components are deployed:
Red5 + our server-side logic is deployed to Tomcat on a Linux computer,
"provider" is a Flex app on a Windows machine,
"slow_consumer" is the same Flex app on a different Windows machine,
"fast_consumer" is, again, the Flex app on a separate Windows machine.
So we have 4 computers (1 server + 1 provider + 2 consumers on different machines).
Both the "slow_consumer" and the "fast_consumer" are watching the live video broadcast
from the "provider".

All the 4 computers are in one fast low-latency LAN (Ethernet, 100 Mbit/s).
IMPORTANT thing in this test is that the "slow_consumer" is working on the computer where CPU is OVERLOADED, its CPU usage is always 100%.
The other 3 computers (the server, the "provider" and the "fast_consumer") are under normal load.

Let us look how the test was organized.
The "provider" sends video data from its video cam to the Red5 server at almost constant rate ~300 kbit/s.

While the CPU load is normal on both consumers - no problem, they both receive the video from server at the same rate of 300 kbit/s. Without any serious delay, without freeze, without dropping frames. All the 4 computers are in the same room, so I can see all of them in the same time.

BUT...
when I run a "cpu killer" app (for example, 10-20 copies of an endless loop app, in addition, start copying of many big files), the situation is changed.
First of all, the "slow_consumer" begins working slowly, a delay appears in video - yes, it's ok because it is slow, overloaded, I see.
But the same problem occurs on the "fast_consumer" - that is on a separate computer! So when one of consumers of a broadcast is slow (it can not consume data at the required speed), the problem occurs on ALL consumers.

Additional notes.
I monitored CPU/memory usage at all the 4 computers in the test, also I monitored network usage.
I can say that all computers except the "slow_consumer" worked normally - CPU usage was very low,
memory consumption was also low.
I have noted that network traffic was different in 2 situations:
1) all the computers under normal load - 300 kbit/s for each "server to consumer" stream
2) "slow_consumer" is under high load - 15 kbit/s for EACH "server to consumer" stream

(20 times slower than the original "provider to server" stream !)

P.S. Details about my testing environment and tools:
- Tomcat 6.0, JDK 6.0
- Linux - Ubuntu, Kernel 2.6.20
- Windows - Windows XP sp1 (for the both consumers), Windows Vista (for the provider).
- JConsole used for memory/cpu monitoring
- tcptrack - for network usage in Linux (shows all connections and their bandwith)
- Flash player 9.0

Changed 5 years ago by vicnov

Interesting thing - I can not reproduce the problem with "VideoConference" example (taken from Red5 SVN repository). Even if I increase video quality and run 10 copies of videoConference.swf (watching video) on the same computer (and start "cpu killers"). The computer begins working slowly, fps is low, some frames are dropped, but there is no such problem as with my example on Flex.

So maybe there is some differences between pure Flash and Flex platforms in how they work with video (in "consumer" role).

"VideoConference" example (pure Flash, requires Flash Player 8 or later) drops some frames but it continues showing video, the video is shown without delay in fact (only dropped frames - it is OK because the computer was intentionally under high load) - as a result, the TCP buffer on the Flash player's side becomes ready to receive new data from Red5 server, then Red5 is ready to handle a new portion of data (send it to all consumers) - in this case everything is working perfectly on all computers (even on slow ones).

Flex (at least in my example) works in a different way - it tries to render ALL FRAMES (it does not drop frames as I can see), but since the computer is under high load, it is not fast enough to render all frames - that's why the TCP buffer on the receiver's side is always full and Flash Player can not receive more data.

Maybe, I misunderstand something? Maybe I can configure Flash player somehow to allow frame dropping?

ANYWAY - even if one of video consumers works slowly, it seems incorrect that other consumers works slowly too. Seems that we could do something
in Red5 code to fix this. Not sure exactly where is the reason, seems that "mina" library uses asynchronous (non-blocking) write operation, so
it should not block.

Changed 5 years ago by vicnov

One additional detail,

when running my Flex-based live conference under high load, TCP system on client-side signal that TCP window size is 0 ("zero window"), it means that TCP buffer at client side is full and can not receive more data from Red5.
Red does not send more data in this case TO THIS CLIENT (len=0 in all TCP segments), but why does Red5 not send data to OTHER CLIENTS?
Other clients declare a normal window size (~64 kbytes), but server does not send data to them (len=0 in all TCP segments for each client) because it is blocked by the first (slow) consumer.

Changed 5 years ago by vicnov

I am attaching the Flex example used in the test ("SampleVideoChatApplication_Flex.zip").

Instructions:
- build Red5 with my SampleRed5Application.java inside (using sample-red5-context.xml).
- copy ChatClient.swf and ChatClient.html somewhere to web root (example - Tomcat/webapps).
- open ChatClient.html in browser ( http://ip_addr:port/ChatClient.html).
- enter a correct URL for you environment (IP address, context root - if you changed this); example:  rtmp://localhost:1935/serverApp
- enter any user name
- make the same with another user name on a different (or the same) computer
You can click on any user to start watching his/her video.

Flash Player 9 is required because this is a Flex app.

Changed 5 years ago by vicnov

And now our today's analysis results.

We were trying to understand how Red5 works with live video streaming - how it receives video data from a provider, how it passes the data between its Java classes and how it sends it to consumers.
One more interesting question was: does Red5 makes some kind of bandwidth control when sending video data to consumers?

First of all, I can see that there is a class called "SimpleBWControlService" which implements bandwidth control - I found its description here, thanks:
 http://jira.red5.org/confluence/display/streaming/Bandwidth+Control+Framework

Yes... sorry... we thought that the problem was due to some bug in this class :) But we have found that by default (if not configured in my application) Red5 does not manage bandwidth, so this is not the cause of our problem (when one slow consumer "kills" all other consumers)!
Yes, of course, this logic is important to make slow client working as perfectly as we can, but let's talk about real cause of the problem we found today.

Finally, at the end of the day, we started our example again with one slow consumer. We ran "jstack" tool (from JDK 6) - randomly, at some moments of time - to see all stack traces in Red5. In most of the snapshots we have seen an interesting thing:

"pool-3-thread-14" prio=10 tid=0xb5340400 nid=0x47eb waiting on condition [0xb43fe000..0xb43fee30]

java.lang.Thread.State: WAITING (parking)

at sun.misc.Unsafe.park(Native Method)

  • parking to wait for <0x81d238a0> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207) at org.apache.mina.common.support.DefaultIoFuture.join(DefaultIoFuture.java:68) at org.red5.server.net.rtmp.codec.RTMPMinaProtocolEncoder.encode(RTMPMinaProtocolEncoder.java:52)
  • locked <0x819fe348> (a org.red5.server.net.rtmp.codec.RTMP) at org.apache.mina.filter.codec.ProtocolCodecFilter.filterWrite(ProtocolCodecFilter.java:236) at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterWrite(AbstractIoFilterChain.java:445) at org.apache.mina.common.support.AbstractIoFilterChain.access$1400(AbstractIoFilterChain.java:54) at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.filterWrite(AbstractIoFilterChain.java:824) at org.apache.mina.filter.executor.ExecutorFilter.filterWrite(ExecutorFilter.java:273) at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterWrite(AbstractIoFilterChain.java:445) at org.apache.mina.common.support.AbstractIoFilterChain.access$1400(AbstractIoFilterChain.java:54) at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.filterWrite(AbstractIoFilterChain.java:824) at org.apache.mina.common.support.AbstractIoFilterChain$TailFilter.filterWrite(AbstractIoFilterChain.java:727) at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterWrite(AbstractIoFilterChain.java:445) at org.apache.mina.common.support.AbstractIoFilterChain.fireFilterWrite(AbstractIoFilterChain.java:436) at org.apache.mina.transport.socket.nio.SocketSessionImpl.write0(SocketSessionImpl.java:196) at org.apache.mina.common.support.BaseIoSession.write(BaseIoSession.java:149) at org.apache.mina.common.support.BaseIoSession.write(BaseIoSession.java:135) at org.red5.server.net.rtmp.RTMPMinaConnection.write(RTMPMinaConnection.java:177) at org.red5.server.net.rtmp.Channel.write(Channel.java:124) at org.red5.server.net.rtmp.Channel.write(Channel.java:102) at org.red5.server.stream.consumer.ConnectionConsumer.pushMessage(ConnectionConsumer.java:137) at org.red5.server.messaging.InMemoryPushPushPipe.pushMessage(InMemoryPushPushPipe.java:86) at org.red5.server.stream.PlaylistSubscriberStream$PlayEngine.pushMessage(PlaylistSubscriberStream.java:1930)
  • locked <0x81a03090> (a org.red5.server.stream.PlaylistSubscriberStream$PlayEngine) at org.red5.server.messaging.InMemoryPushPushPipe.pushMessage(InMemoryPushPushPipe.java:86) at org.red5.server.stream.ClientBroadcastStream.dispatchEvent(ClientBroadcastStream.java:323) at org.red5.server.net.rtmp.BaseRTMPHandler.messageReceived(BaseRTMPHandler.java:180) at org.red5.server.net.rtmp.RTMPMinaIoHandler.messageReceived(RTMPMinaIoHandler.java:120) at org.apache.mina.common.support.AbstractIoFilterChain$TailFilter.messageReceived(AbstractIoFilterChain.java:703) at org.apache.mina.common.support.AbstractIoFilterChain.callNextMessageReceived(AbstractIoFilterChain.java:362) at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilterChain.java:54) at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.messageReceived(AbstractIoFilterChain.java:800) at org.apache.mina.filter.executor.ExecutorFilter.processEvent(ExecutorFilter.java:247) at org.apache.mina.filter.executor.ExecutorFilter$ProcessEventsRunnable.run(ExecutorFilter.java:307) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619)

As we can see RTMPMinaProtocolEncoder is waiting for something at line 52 ("encode" method, which invokes mina's "join"). We looked there and - voila - there is the following code:

synchronized (state) {

final ByteBuffer buf = encode(state, message);
if (buf != null) {

out.write(buf);
final WriteFuture future = out.flush();
if (future != null) {

future.join();

}

}

}

"out.write()" according to mina documentation is asynchronous, it does not block, but after that we are waiting until the data is sent ("future.join()")!
It is important here that Red5 has only 1 thread for sending data to ALL consumers (of the same provider); look at InMemoryPushPushPipe class, "pushMessage" method:

for (IConsumer consumer : consumerList) {

try {

((IPushableConsumer) consumer).pushMessage(this, message);

} catch (Throwable t) {

...

}

}

So, after thinking at this a little, we tried the following simple fix - just comment the following lines in RTMPMinaProtocolEncoder class:

/* final WriteFuture future = out.flush();
if (future != null) {

future.join();

} */

After that our example works as it should be in this situation:
the slow consumer is almost frozen, but all other consumers are working perfectly without any problems!

This is not a final solution, I see. After this fix the sending buffer in mina grows, memory usage raises of course. I think we need to improve this fix:
1) we should limit the sending buffer size in mina somehow
2) we should implement frame dropping for slow clients - so event slow clients will work not so badly. Or maybe we should apply the bandwidth control implemented in SimpleBWControlService class, but it should happen automatically, we should get this bandwidth from client side somehow of from current state of TCP buffers?...

One more question:
why does RTMPMinaProtocolEncoder contains this "waiting" logic (until the data is really sent? I suppose that after we called IOSession.write in mina, packet order should be correct. And when the data is sent we will get a callback in RTMPMinaIoHandler.messageSent().

So, it is really important for me what you think about this fix and how we can improve Red5 logic for slow-working clients.
I will test this fix in more details under high load and will post the results here.

Changed 5 years ago by vicnov

I think this issue exists in the latest trunk (r2228) because the classes in question are not changed since r2225.

Changed 5 years ago by paul

I have committed a work-around "fix", it is the default right now on trunk. To switch back to the old behaviour simply comment out the "Fast" class and uncomment the previous version in red5-common.xml.

<!-- RTMP codec factory -->
<bean id="rtmpCodecFactory" class="org.red5.server.net.rtmp.codec.RTMPMinaCodecFactory"

autowire="byType" init-method="init">
<!--

<property name="minaEncoder">

<bean class="org.red5.server.net.rtmp.codec.RTMPMinaProtocolEncoder"/>

</property>
-->
<!-- Fix / option for defect SN-1 -->
<property name="minaEncoder">

<bean class="org.red5.server.net.rtmp.codec.FastRTMPMinaProtocolEncoder"/>

</property>
<property name="minaDecoder">

<bean class="org.red5.server.net.rtmp.codec.RTMPMinaProtocolDecoder"/>

</property>

</bean>

Changed 5 years ago by vicnov

Paul,
you have removed "synchronized" keyword instead of "flash" and "join" calls in FastRTMPMinaProtocolEncoder.
"Synchronized" is not a problem. The problem is that the decoder is WAITING while the data is really sent. Why do we need this?

Our code is the following:

synchronized (state) {

final ByteBuffer buf = encode(state, message);
if (buf != null) {

out.write(buf);

}

}

Now we are working at how to limit mina buffer size (which grows when calling write() for slow consumers). A better solution would be to estimate CURRENT data rate at which the slow client can consume data and make frame dropping, but this requires much time to implement...

Changed 5 years ago by joachim

I'm reopening the issue because the solution removes a critical fix for the FP segfaulting.

The encoder is synchronizing and waiting for the data to be sent because otherwise Red5 could end up sending packets with wrong rtmp (size) headers to the clients if packets are send multithreaded. One case for this are multiple clients updating lots of shared objects simultaneously - and wrong headers can cause the FP to segfault.

See  http://jira.red5.org/browse/APPSERVER-177 for details.

Changed 5 years ago by vicnov

Joachim,
as I mentioned above - we can use "synchronized" block but remove "flash" and "join" call. Can you or somebody test it with SharedObjects (APPSERVER-177)? I will also ask Lenny Sorey to test (As I remember his app uses SharedObjects).

I looked at how IoSession.write method is implemented in mina: it add the data to the LinkedList, this list is ordered, and the data will be sent asynchronously in the correct order.
I think there is no reason to wait until the data is sent. What do you think?

Changed 5 years ago by vicnov

We have done more testing today to understand how the server works after our fix. Results are good for our application (attached here).

First of all, we worried memory usage at server side because Red5 does IoSession.write() many times without waiting the data is really sent to client. This data is stored in mina's LinkedList (of unbounded size), so this could give us a very big LinkedList for a slow consumer.
But we have noted that Red5 periodically sends "ping" messages to clients and waits for a "pong" answer during some period of time. If there is no "pong" answer, the connection is closed and all the resources are released (including IoSession's LinkedList with pending data). This is exactly what we see in our case - if a slow client is VERY OVERLOADED, there are many many pendingMessages (several hundreds and even thousands), in this case the client does not receive these "ping" messages and Red5 closes this client's connection after some time.
So, there should not be memory exhaustion in Red5 because of slow consumers.

Things are even better than I was thinking - Red5 DOES frame dropping for video data and it works correctly - when we have a slow consumer, most of pending messages are non-video ones (audio, data, etc). If pendingVideos>1, Flash/Flex client will receive "onStatus" event in NetStream (code=NetStream.Play.InsufficientBW), so it is possible to show a warning to the user ("Your video stream (connection) is unstable...").

I am also waiting for testing results from Lenny and other people; we will continue our testing too.

One question is still unclear for me: why pure Flash app (like Red5' "videoconference" sample) works without any problem (even without this Red5 fix), but at the same time, our Flex-based app works in a different way...

Also, it is interesting that I was unable to make a "slow consumer" on my Mac computer - I mean that in any case (at any load) it showed video (fps was low under load, of course), but pendingMessages and pendingVideos were both near to zero. Seems that Flash player's Flex implementation is different on different platforms.

Changed 5 years ago by vicnov

We are continue testing the bug fix - everything works correctly in our application (a live video chat).
Did somebody have problems with the bug fix?

Changed 5 years ago by skipidar

Victor hello, we are working on Red5 Audio Application like phone ... and we got strange client's buffer behavior too. We use 8KHz mic mode and connect 2 clients, BufferTime is set to 0. On some computers we got a very little liveDelay - from 0 to 0.5 sec, on another - it can be about 2 sec. Application must work in realtime (0 - 0.5 sec is allowed, but >1 is very bad). So we need to find a solution, our app has been written in Flex. Maybe you have some ideas?

Changed 5 years ago by joachim

Just tested the SO stresstest from APPSERVER-177 with this bug fix - the FP still keeps crashing with the modified code... :(

Changed 5 years ago by joachim

Victor,

I slightly modified the "RTMPMinaProtocolEncoder" class in r2250. Could you please test this with your code to see if you still have any problems? APPSERVER-177 works as expected with the changes.

To enable the encoder after updating, change your red5-common.xml and re-enable the class around line 47 instead of the "FastRTMPMinaProtocolEncoder".

Thanks,

Joachim

Changed 5 years ago by vicnov

Joachim,
I have done our tests using RTMPMinaProtocolEncoder from r2250 - everything is OK now, our Flex live chat works well !

Thanks.

Changed 5 years ago by electroteque

Hi what exactly does this fix ? Does it fix the buffering / dropouts / dropped frames problems with live streams we've just been experiencing ? The initial ticket was regarding the net send buffer that was set too low by default as it was finally worked out months later.

Changed 5 years ago by vicnov

Dan,
what is fixed now is the problem described by me here on Aug, 14:

"
Briefly, the problem is:
if we have one very slow video consumer (it receives data slowly from Red5) and this consumer is a Flex app, then ALL OTHER consumers (of the same video provider) are also working slowly - Red5 does not send data to them.
"

How it was fixed: Red5 now does asynchronous (non-blocking) "write" operation when sending data to consumers, without waiting that data is really sent to network. This allows not to block other consumers (because all "write" operations are done in one thread). So this works perfectly now in our application.

Changed 5 years ago by joachim

Paul's workaround for SN-1 is no longer needed in r2274.

Changed 3 years ago by vicnov

Changed 3 years ago by danielr

  • status changed from new to closed

Changed 2 years ago by bascorp

Note: See TracTickets for help on using tickets.