Jump to content

Tim Stephenson (Archant IS)

Members
  • Posts

    159
  • Joined

  • Last visited

    Never

Posts posted by Tim Stephenson (Archant IS)

  1. Guys,

    Just thought I'd chip in here.

    PM's cannot be read by anyone other than the author and the intended recipient.

    It is not a case of having the right password - the forum software doesn't allow any users other than the author and recipient to access the message.

    Mods or forum Admins have no direct access to any of the web or database servers in our web farm and have no way to bypass forum security.

    Should you or anyone else have any technical queries relating to this you're more than welcome to contact me and I'll try to assist.
  2. We had a major storage system failure yesterday afternoon, which took this forum & a variety of other Archant hosted sites offline.

    Emergency work to restore service has been completed successfully throughout the night, and although work will continue today on a more permanent solution the sites (and this forum) should hopefully remain online.

  3. "Thank you Tim. Was that a complex way of saying that someone pulled the plug out or that the keys were lost or that an employee under notice was hyperactive with the chewing gum in the moving parts"

    None of the above unfortunately... a large, very expensive bit of equipment decided to break in a way noone had expected it to break :(
  4. "Interestingly, the property bit of the website (and therefore the bit that earns money) was up and running well in advance of the forum."

    The property service is hosted on a separate set of servers, and as such wasn't actually affected by the failure.

    Any downtime on that system is unfortunately nothing more than a coincidence!
  5. All,

    Apologies for the downtime over this weekend. We've had a series of major failures within our web hosting infrastructure that knocked ALL Archant hosted sites offline.

    Our core web storage systems failed late Friday afternoon, which took most of our sites offline and the rest unfortunately followed later Friday evening.

    The faults were escalated to the highest level within the storage system vendor's management, and a team of 3 vendor engineers & senior Archant IT staff worked virtually non-stop through the weekend to bring systems back online.

    Most of the major failures were resolved by around 02:30 Sunday morning, but it has taken until this morning to fully iron out all of the remaining stability & minor issues.

    Our sites are now all back up & running, although we're awaiting some replacement hardware which should be delivered to our data centre (along with an engineer to install it) tomorrow morning.

    The site may experience a further brief outage tomorrow but otherwise stability should return.

    Kind regards

    Tim
  6. If the undelivered emails were messages you've sent from your own email account, then its nothing to do with our servers....

    Would advise double checking the email address you were trying to send an email to and resending :)

  7. Apologies for the delayed emails... one of our outbound email servers decided to stop sending messages for 48 hours due to running low on disk space.

    Its taken about 2 days to clear the backlogged messages!

  8. Think I probably beat you to grasping wood...

    We're moving some sites around between servers to better distribute our traffic across available machines. We'll try and keep any disruption to a minimum but if you get any strange errors - please just refresh your page..

    Back to normal soon hopefully.
  9. We're still trying to get to the bottom of what caused the crashes to happen so suddenly...

    The storage system supplier currently believes our system suffered some form of critical failure that took the units offline. Replacement hardware is being dispatched to us for delivery Monday.

    Am sat at home however as opposed to the office!
  10. It certainly does cost a substantial amount of money to keep everything running smoothly. Unfortunately, as today's demonstrated sometimes even the best laid plans can fall apart....

    One of our web filers developed a fault this afternoon that's managed to corrupt the website file systems.

    This has meant that the webservers have been unable to access certain elements of our sites (including most of the forum systems), and the variety of "Server Error" pages you've probably been seeing were the direct result.

    Work's under way at the moment to roll back to an earlier version of the filesystem and to restore stability.

    Myself

    and another IS engineer are on the case, along with emergency support from

    the storage system vendor.

    Hopefully things will be a little smoother through the weekend, and our 4 additional webservers should be online early next week - bringing further performance and stability improvements.

    Tim
  11. Apologies for this - our web servers were set to be too agressive in "recycling" the forum website application, causing periodic freezes while they clear out all used memory before continuing to handle traffic.

    In theory these "recycles" should not impact site performance but when they're happening every 5 minutes something's not quite right.... (you'd normally expect a handful of recycles a day, not 2 every 10 minutes !).

    The settings concerned have now been relaxed, and we're keeping an eye on things to further  tune performance through the day.

    Tim

  12. [quote user="Mel "]

    Here in Archant country in Norwich, the websites of both the Eastern Daily Press and the Evening News have been periodically off-line over the last day or so.

    Can somebody at I.S. confirm that these problems are down to out-dated equipment being used by one of the country's leading media groups?  

    [/quote]

    Unfortunately we can't confirm that ;)

    The recent instablity was actually down to the introduction of a new, state-of-the-art network storage system - and a resulting critical problem with our Windows webservers. We've been working with both the storage system manufacturer and Microsoft's product support engineers to resolve the problem and have just completed live testing on a hot fix designed to work around some of the restrictions in the Windows networking specifications.

    So far so good, and stablity seems to be returning to the server farm!

    There may be more issues over the next week or so as we migrate more of our sites across to the new systems but we obviously work to minimise any disruption by conducting the majority of work out of hours.

    Regards,

×
×
  • Create New...