Author Topic: [DEVBLOG] TQ Level Up (With Cluster Pics)  (Read 1971 times)

Offline Mangala

  • Administrator
  • League of Extraordinary Gentleman
  • *****
  • Posts: 7534
  • WTF did I do??
    • View Profile
    • My EVE Blog
[DEVBLOG] TQ Level Up (With Cluster Pics)
« on: June 16, 2010, 01:38:39 PM »
Linky: http://www.eveonline.com/devblog.asp?a=blog&bid=769

Quote
What makes up the live EVE Cluster and how it's all done is something of a mystery to many who have speculated what it's made of and how it's all connected together.

As you may know the Tranquility (TQ) cluster will be down for maintenance on Wednesday, June 23, 2010 from 0900 to 1500 UTC.

With a migration and a bit of a redesign on the way I thought it was about time to deliver the facts on what it is and will be.

Step one: A cozy new home

TQ has morphed and adjusted over the years as much as EVE Online has.  It's gotten to the point were a couple of cabinets simply don't handle it anymore.  So, this first step is to move TQ to a bigger place.  We'll still be in the same datacenter and connecting to you from multiple networks to ensure the best performance, but this time with a lot more space and power and room to grow.

The new space is a whopping 79kW of power across 12 cabinets.  With the larger space and added power, we can now aggregate TQ, Singularity, and the ancillary EVE Services (web, forums, account management, etc.) into a single location in the datacenter.  This will provide better network connectivity, fewer intermediary devices and increased capacity.

As with any dense computer solution like our blade servers, heat is always a major concern.  Sure, we get great management tools and reduced physical space requirements, but we still have to cool the servers.  To do this we've moved from an ambient cooled system (basically the open room temperature is managed but not funneled direct to server intakes) to a completely self-contained, closed aisle cooling system.  Cold air from the center of the aisle is force-fed into the cabinets reducing the loss or wasted cool air significantly and helping to focus cold air where it's needed most.  This takes the industry standard "hot aisle/cold aisle" designs a step further without having to do anything crazy like running servers under nitrogen pools (although that is pretty cool).



Step two: Networking to 9000

Most of the traffic on the network in TQ is happening between the servers on the internal network.  While the routers we use are quite powerful (Cisco 7600's with the RSP720 route processors), our internal switching needed a kick in the pants.  With the move we are going to be adding about 800% capacity to our side to side network along with some really nice Cisco Distributed Forwarding Cards (DFC3) to the network blades themselves to help reduce the latency and reduce burden on the supervisor cards that run the switches.



Step three: Pics or it didn't happen

We are going to continue the information sharing about the infrastructure that makes EVE work on the next installment.  Although not everyone gets excited about cabinets and a datacenter, there are a few that do.  I personally keep them posted on my wall at home.  This is meant to be the first of many installments as we continue to improve the infastructure that EVE runs on.

Step four: But, how does this help me get my ship back?

The increase in Layer 2 switching capacity, reducing in latency through Distributed Forwarding, and the extra cold hamsters will have an impact on the ability to reduce overall latency in EVE.  It is not a single solution, but a good foundation where core infrastructure can be eliminated as a possible concern.

The next tech installment will have more details on Remapping EVE, the next level of Fleet Fighting and better prediction of hot spots for dedicated nodes.

TQ Tech Details: (Not the whole system, just what runs TQ)

Servers
                64 x IBM HS21
                2x Dual Core 3.33GHz CPU's
                32GB of RAM Each
                1x72GB HDD Each
               
                2 x IBM X3850 M2's
                2x Six Core 2.66GHz 
                128GB of RAM
                4 x 146GB HDD

Cores
                - 280 total Cores
                - ~1 THz
               
RAM
                - 2.3TB of Total RAM

Storage
                - 4.8TB of Local Storage
                - 2TB of SSD SAN
                - 256GB of RAM SAN

Network
                - Gigabit Ethernet
                - 4Gb/s Fiber Channel

So what will June 23 look like?

Here's our current downtime schedule for when TQ will be offline:


0900:     All EVE Services go offline. (Web, Forums, Test Servers, EVE Gate, TQ, basically everything hosted in London)

1200:     EVE Online web, secure and Test Servers come back online.        (all network services reestablished in London.  Only TQ should still be down at this time)

1500:     TQ back online

"May God stand between you and harm in all the empty places you must walk."


Offline Mangala

  • Administrator
  • League of Extraordinary Gentleman
  • *****
  • Posts: 7534
  • WTF did I do??
    • View Profile
    • My EVE Blog
Re: [DEVBLOG] TQ Level Up (With Cluster Pics)
« Reply #1 on: June 22, 2010, 01:40:49 PM »
Remember to set LOOOOOONG skills to cover this tomorrow folks.  Moving server stuff could fuck EVE over for a few days which would be bad.
"May God stand between you and harm in all the empty places you must walk."


Offline Caradir

  • HoJ Members
  • League of Extraordinary Gentleman
  • ***
  • Posts: 3568
    • View Profile
Re: [DEVBLOG] TQ Level Up (With Cluster Pics)
« Reply #2 on: June 22, 2010, 04:31:19 PM »
Remember to set LOOOOOONG skills to cover this tomorrow folks.  Moving server stuff could fuck EVE over for a few days which would be bad.

im hoping assault ships IV will be less than 24 hours when i log on, so i can add AS V to the end that should be long enough ;)
"Banking was conceived in iniquity and was born in sin. The bankers own the earth. Take it away from them, but leave them the power to create money, and with the flick of the pen they will create enough deposits to buy it back again. However, take away from them the power to create money and all the great fortunes like mine will disappear and they ought to disappear, for this would be a happier and better world to live in. But, if you wish to remain the slaves of bankers and pay the cost of your own slavery, let them continue to create money." Josiah Stamp (Governor Bank of England 1928-41)

Offline Mangala

  • Administrator
  • League of Extraordinary Gentleman
  • *****
  • Posts: 7534
  • WTF did I do??
    • View Profile
    • My EVE Blog
Re: [DEVBLOG] TQ Level Up (With Cluster Pics)
« Reply #3 on: June 22, 2010, 04:43:24 PM »
Im not even going to log onto eve after 3pm GMT tomorrow - I'll probably wait till 4pm BST THURSDAY before I do, just to be on the safe side.
"May God stand between you and harm in all the empty places you must walk."


Offline Dehn

  • MAADI
  • Lurkers
  • ***
  • Posts: 84
    • View Profile
Re: [DEVBLOG] TQ Level Up (With Cluster Pics)
« Reply #4 on: June 23, 2010, 05:28:58 AM »
still have 9 days left or so on surgical strike V so all fine here :)

Offline Mangala

  • Administrator
  • League of Extraordinary Gentleman
  • *****
  • Posts: 7534
  • WTF did I do??
    • View Profile
    • My EVE Blog
Re: [DEVBLOG] TQ Level Up (With Cluster Pics)
« Reply #5 on: June 23, 2010, 05:46:48 PM »
So glad im on looooong skills :)
"May God stand between you and harm in all the empty places you must walk."


Offline Mangala

  • Administrator
  • League of Extraordinary Gentleman
  • *****
  • Posts: 7534
  • WTF did I do??
    • View Profile
    • My EVE Blog
Re: [DEVBLOG] TQ Level Up (With Cluster Pics)
« Reply #6 on: June 24, 2010, 11:40:42 AM »
Welp.

Tomorrow anyone? ;)
"May God stand between you and harm in all the empty places you must walk."


Offline Mangala

  • Administrator
  • League of Extraordinary Gentleman
  • *****
  • Posts: 7534
  • WTF did I do??
    • View Profile
    • My EVE Blog
Re: [DEVBLOG] TQ Level Up (With Cluster Pics)
« Reply #7 on: June 24, 2010, 01:06:42 PM »


2 of my chars queues will run out (they have nothing super long...) this afternoon.
"May God stand between you and harm in all the empty places you must walk."



Offline Mangala

  • Administrator
  • League of Extraordinary Gentleman
  • *****
  • Posts: 7534
  • WTF did I do??
    • View Profile
    • My EVE Blog
Re: [DEVBLOG] TQ Level Up (With Cluster Pics)
« Reply #9 on: June 24, 2010, 04:21:59 PM »
Quote
In gratitude for your patience, we will give an extra pool of skillpoints to all accounts (paying and trial) that were active at the beginning of this downtime, on one character per account. This skillpoint pool will be appropriately sized for the downtime time frame, universal across all accounts regardless of character attributes/implants and may be applied as each player wants.

This will be done through a new system in the development pipeline, currently scheduled for deployment next Tuesday’s patching opportunity* during regularly scheduled downtime. Since it has been “hot dropped” into the development plans, we will be providing step-by-step instructions for how to use it as soon as possible.

http://www.eveonline.com/news.asp?a=single&nid=3963&tid=1
« Last Edit: June 24, 2010, 04:28:00 PM by Mangala »
"May God stand between you and harm in all the empty places you must walk."


Offline Warcold

  • MAADI
  • League of Extraordinary Gentleman
  • ***
  • Posts: 3670
    • View Profile
Re: [DEVBLOG] TQ Level Up (With Cluster Pics)
« Reply #10 on: June 24, 2010, 05:23:03 PM »
you missed the other important bit:

Quote
*Coincidentally, next week we had already been planning to give another gift to all pilots. More on that gift Soon ™.
'Our lives are not our own. From womb to tomb, we are bound to others. Past and present. And by each crime, and every kindness, we birth our future.'

'We are not enemies, but friends. We must not be enemies. Though passion may have strained, it must not break our bonds of affection.
The mystic chords of memory will swell when again touched, as surely they will be, by the better angels of our nature.'


http://warthunder.com/en/registration?r=userinvite_3240166

Offline Mangala

  • Administrator
  • League of Extraordinary Gentleman
  • *****
  • Posts: 7534
  • WTF did I do??
    • View Profile
    • My EVE Blog
Re: [DEVBLOG] TQ Level Up (With Cluster Pics)
« Reply #11 on: June 30, 2010, 02:28:53 PM »
Follow up blog to the move:

Quote
We wanted to give you some details about what happened on June 23, 2010, when we performed the Tranquility server move, why the move took much longer than we scheduled and what we are doing to prevent the issue that caused the extended downtime from happening again.

First, what did we get done?

Everything.  The new Ethernet and Fibre Channel switches were installed, the servers were moved to the new larger and cooler space, redundancies were put in place, etc.  We actually got most of the work we had planned done in the timeline we had originally announced.  Despite rumors and criticisms to the contrary, our plan included a significant time buffer for the work. We‘d been prepping the space for about three weeks prior as well--testing power and cooling, putting in place all the backbone cable systems for servers and switches, and getting external network connectivity verified and tested.  To some it seemed that we randomly chose „six hours" as our total time frame, however at no time would we make up numbers we didn‘t wholeheartedly expect to meet.

So what happened?

When we attempted to fire up the Tranquility database we experienced some failures on the new storage area network we had just put in place.  These issues were not discovered until we started running our normal cleanup jobs (these jobs touch just about every part of the database) on the database and started putting actual load on the storage area network.  Once under load the problems were discovered, but not before the database and many of the vital tables needed to operate EVE were found to be heavily corrupted.

Why did it take so long to get TQ back online?

In order to recover the database after finding the root cause and fixing it, we had to go through the process of replacing the logical database with a new copy. A backup of the Tranquility database was deployed: we began recovering the corrupted transaction logs, and replaying them to fill in any missing data.  You can think of this process much like your credit card statement.  You can see the current balance that may not reflect the burrito buying spree you went on last night.  In order to get the statement to match what you actually owe you will also need to add in the transactions for the burritos, soda and antacid you bought last night. Then comes the integrity checks, To verify the database is in good shape a number of very slow and CPU intensive programs have to be run.  This helps ensure that we are not going to cause further damage and the database is in fact all there. These can take hours to do and we started with the most vital tables (such as the Inventory Items Database that makes sure the Raven in you ship hanger is yours and exists in game) and worked our way down in parallel with our QA team, who did a very thorough job of testing EVE in VIP Mode. VIP Mode is when Tranquility is up, but accessible to CCP staff only (many of you noticed and were curious about why 30+ others were on TQ while you couldn‘t login).  While rolling back the database and losing transactions was an option, we chose the longer recovery path and testing to make sure no player actions in the game were lost due to the corruption.

What are we doing to prevent this?

As you may know, EVE‘s database is a fairly big and powerful thing.  In order to maintain it and reduce the recovery time in situations like this we are putting a project in place to modify nearline recovery and establish faster rebuilds of transactions should gaps exist.  The team is currently working on the specifics of this new database architecture.  Once we have the new design plan in place and tested, we‘ll post more details and some drawings of the changes.

We all really appreciate the understanding and kind words... and even the harsh ones we needed to hear.

Let us know if you have more questions and I‘ll do what I can to keep up and answer for a few days here.

Fly Dangerous.
"May God stand between you and harm in all the empty places you must walk."


Offline Warcold

  • MAADI
  • League of Extraordinary Gentleman
  • ***
  • Posts: 3670
    • View Profile
Re: [DEVBLOG] TQ Level Up (With Cluster Pics)
« Reply #12 on: June 30, 2010, 03:38:57 PM »
Quote
new database architecture
Make sure you have a 30 day skill handy everyone!
'Our lives are not our own. From womb to tomb, we are bound to others. Past and present. And by each crime, and every kindness, we birth our future.'

'We are not enemies, but friends. We must not be enemies. Though passion may have strained, it must not break our bonds of affection.
The mystic chords of memory will swell when again touched, as surely they will be, by the better angels of our nature.'


http://warthunder.com/en/registration?r=userinvite_3240166

Offline Caradir

  • HoJ Members
  • League of Extraordinary Gentleman
  • ***
  • Posts: 3568
    • View Profile
Re: [DEVBLOG] TQ Level Up (With Cluster Pics)
« Reply #13 on: June 30, 2010, 03:40:44 PM »
Quote
new database architecture
Make sure you have a 30 day skill handy everyone!

Cruisers V could be going into my q then ;)
"Banking was conceived in iniquity and was born in sin. The bankers own the earth. Take it away from them, but leave them the power to create money, and with the flick of the pen they will create enough deposits to buy it back again. However, take away from them the power to create money and all the great fortunes like mine will disappear and they ought to disappear, for this would be a happier and better world to live in. But, if you wish to remain the slaves of bankers and pay the cost of your own slavery, let them continue to create money." Josiah Stamp (Governor Bank of England 1928-41)