Posted by diego on May 19, 2008 – 9:58 am
From where you’re sitting as a Network Creator on Ning, platform releases aren’t very transparent. The Ning Platform and your social networks go down for a couple of hours, then come back up, and not much has changed.
With nothing visible to you, the natural question is why did we take everything down at all? And, in rare occasions like this past weekend, the release is followed by another release right after the first one, and you’re left wondering what is happening.
So, here’s a little on what went on behind the scenes of Saturday night’s release, and the update that followed on Sunday night…
Continue reading The Story of a Release…
Posted by Alex on May 9, 2008 – 7:58 pm

There will be a planned platform maintenance this Saturday night May 10th, 2008 from 11pm-2am Pacific. During this window, Ning.com and all networks will be offline.
This maintenance will improve scalability and performance across the Ning Platform. To stay up-to-date during this release, be sure to check out the Ning status blog.
Posted by Alex on April 9, 2008 – 6:46 pm

We’ll be rolling out some awesome new features Thursday April 10th, which means that Ning and your networks will be unavailable from 9pm to 9:30pm Pacific Time.
If you’d like to see how things are going, the Ning Status Blog is the place to hang out.
Posted by Gina Bianchini on March 7, 2008 – 6:33 pm
Performance and stability is our number one priority. In this context, you can probably understand just how un-awesome we rate this week. It was atypical and we are committed to ensuring that it stays that way.
As a free service, we love that you hold us to the same uptime standards offered by an expensive service provider. While we don’t offer Service Level Agreements, we are committed to providing you the absolute best, highly scalable platform for creating your own social network for anything.
To put the atypicalness of this week in context, here’s our uptime statistics for the past two months compared to the first week in March:

In January, we had 99.72% uptime. In February, it was 99.41%. And for the first week of March - also known as this week - our uptime as a percentage of the first 7 days is 95.51%.
The issues this week were primarily (but not exclusively) a result of the distributed caching issue that we talked about on Monday. This issue has been resolved and is now closed.
Separately, today we had a six minute downtime which was preceded by 20 minutes of slowness and errors on the social networks across the Ning Platform. We quickly identified and resolved the issue, which was separate and distinct from the distributed caching issue earlier this week. We are currently working on a patch to ensure today’s issue doesn’t happen again.
To be clear, none of these events have been a result of challenges scaling individual networks or the platform as a whole. The Ning Platform has demonstrated nothing but grace under pressure while quadrupling traffic in the last two months. It continues to hum along happily today despite over 1.5% daily compounding page view growth.
We take this week’s unplanned downtimes extremely seriously. Our goal at this point is to eliminate any additional unplanned downtime this month while limiting the planned maintenance windows for new releases to critical issues that need to be addressed. The proof of our work will be in the pudding, namely our continued goal of well above 99% uptime.
So, while we continue to drive greater performance and stability, expand the viral features in your networks, and add more new features to make your networks increasingly spectacular, keep holding us to your highest standards. It’s a fraction of what we ask of ourselves.
Posted by diego on March 3, 2008 – 11:52 pm
Today we experienced some turbulence on the platform and as a result we were down on two separate occasions for about 20 minutes each time. In fact, we have had similar issues twice before in the last two weeks, making for a total of about 60 minutes of unplanned downtime since then, including today’s 40 minutes. Today’s downtimes have taken us a little longer to recover since we have spent some extra time gathering information to debug the problem.
In complex software systems there are at times issues that occur as a combination of a number of factors, load, which servers are involved in certain operation, and depend on real-time traffic. This makes such problems, unfortunately, very hard to replicate and debug. This is why it’s taking us longer than usual to identify and fix the problem.
What we do know is that these incidents are all related to the same distributed caching system (although causing issues in different configurations) and for the last two weeks we’ve been hard at work on tracking down exactly what causes the problem and how to fix it. We have also put in place some measures to improve the situation and reduce the probability of future downtimes if we see this specific issue happen again.
What to expect from here
We take this problem extremely seriously. We have people dedicated to addressing it and we believe we’re making progress. The fact that we’re still working on it doesn’t mean we will have to take downtime to solve it, whenever possible we deploy fixes to components live. That being said, if there is another outage due to this problem, we have a set of actions planned that should be able to recover fairly quickly.
Just like before, this problem is affecting runtime, not storage, so the data on your Ning Network remains safe.
We will post an update here on the Ning Blog on this topic tomorrow night or as soon as we have more information. In the meantime, you can get in touch with us via the Help Center or watch for more real-time updates on Network Creators and the Ning Status Blog.
Thanks for your patience as we hunt down and resolve this issue!
Posted by Gina Bianchini on March 3, 2008 – 3:13 pm
I would like this particular Monday a whole lot better if we weren’t experiencing now the second of two unplanned downtimes today.
First, let me apologize for this inconvenience. Stability and performance are our top priorities and this is an extraordinary event that we want to keep extraordinary. While we’re addressing the production issue right now and will continually improve our uptime and stability, this level of downtime isn’t acceptable to us. It’s just not the kind of service we want to be.
We believe this issue - as well as the brief issue from last week - are a result of a complex bug with an evasive trigger that we’ve identified and are working through as we speak. We will have a more detailed update as we get everything back online and ideally stable through the rest of today.
We appreciate your patience with us as we address this situation and ensure that we eliminate it going forward. I’ll have another update on our progress and status by 3:30pm PST.
3:30pm Update: We’re back online and investigating what happened. I’ll have another update tonight.
Posted by athena on February 29, 2008 – 9:17 pm

It will be exactly like this except with more release goodness and less neon lighting.
Ning and the social networks running on it will be unavailable on Saturday night, March 1st from 11pm to 1am 2am Pacific Time while we upgrade the platform in preparation for our upcoming network release. We will be back in action early Sunday morning. Thanks!
UPDATE at 9pm on Saturday: We’re now looking at a window from 11pm to 2am.
Posted by diego on February 12, 2008 – 4:06 pm
This morning we found that we had introduced a problem in the release last night that affected the stability of the platform. This was the reason for the downtime early today, which lasted for a little over an hour.
We also had downtime for about 20 minutes at 2 PM, PST. The source of this was related, but different, than the problem we experienced earlier in the morning.
We do know that what is creating instability is part of the distributed caching infrastructure we use to speed up data access operations. However, we don’t know exactly why this is happening, although we’re working feverishly to find the source of the problem and fix it.
As part of this work, we are now taking a 30 minute planned downtime to restart some components and remove new systems we added earlier today.
If the problem happens again later today, we may have to roll back the release done last night to make sure that some subtle bug isn’t responsible for this instability.
This problem is affecting runtime, not storage, so the data on your Ning Networks remains safe.
Rest assured that solving this problem is our highest priority right now. We will continue to post updates as we have them, here, as well as on Network Creators and the Ning Status Blog.
Thanks!
Posted by Gina Bianchini on February 12, 2008 – 9:44 am
One never likes to put out a new release to find an unexpected issue the next morning. Yet, that’s exactly the situation we found ourselves in today. Our sincere apologies.
We introduced a database issue with this release that we’re currently patching during this downtime. This was unexpected and type of issue that shows up in production despite copious amounts of testing.
We should be back online by 9:50am PST and I’ll have another update shortly.
Again, our sincere apologies and we’re working to ensure that we get out in front of anything else that might be an issue today.
Thanks again for your patience. We really appreciate it!
UPDATE AT 9:50AM PST: We are going to need 5 more minutes as we restart the systems.
UPDATE AT 10:00AM PST: And…we’re back. Please let us know via the Ning Help Center if you are still seeing any issues and we’ll keep you posted as we debrief the situation.
Posted by Kyle Ford on February 11, 2008 – 1:52 pm

We’re rolling out some exciting new features tonight, and this means that Ning and your networks will be unavailable from 11pm to 12am Pacific Time.
If you’d like to follow the progress, the Ning Status Blog is the place to be.