Tuesday, October 2, 2012

Web Performance Tuning Tips Solutions for Drupal Sites

Web Performance Tuning Tips Solutions for Drupal Sites

[] HAST + CARP + ZFS - High Availability HA cluster solution for FreeBSD.

HAST (Highly Available Storage) - allows to transparently store data on two physically separated machines connected over the TCP/IP network.

CARP (Common Address Redundancy Protocol) - allows multiple hosts to share the same IP address. In some configurations, this may be used for availability or load balancing. Hosts may use separate IP addresses as well, as in the example provided here.

http://www.freebsd.org/doc/handbook/carp.html
http://forums.freebsd.org/showthread.php?t=17133
http://blather.michaelwlucas.com/archives/241

[] MySQL Innodb storage engine.

[] ZFS file system - ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include data integrity (protection against bit rot, etc), support for high storage capacities, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z and native NFSv4 ACLs.

[] NFS (Network File System) - is a network file system protocol originally developed by Sun Microsystems in 1984,[1] allowing a user on a client computer to access files over a network in a manner similar to how local storage is accessed. NFS, like many other protocols, builds on the Open Network Computing Remote Procedure Call (ONC RPC) system. The Network File System is an open standard defined in RFCs, allowing anyone to implement the protocol.

[] Apache MPM Worker (instead of pre-fork, prefork) + fcgid + APC + memcached
Go for a threaded server, and PHP as an fcgid.

Note: APC does not currently share its cache between multiple php-cgi workers running under fastcgi or fcgid. See this feature request for details: "this behaviour is the intended one as of now".

Note: If you want to go for a threaded server, you must use Apache MPM Worker instead of Prefork; otherwise thread will not be enabled. Use httpd -V to check current server status.

Note: Keep in mind that it's impossible to share APC cache between mod_fcgid PHP-CGI instances. You have to use PHP-FPM for that. mmap_file_mask doesn't help when it comes to mod_fcgid. You can check by lsof -p PHP-PID and see that different /tmp/apc.* are memory mapped in different PHP processes. So you may be served by one PHP process and see empty APC cache by checking another PHP process - they have separate caches. Try FcgidMaxProcesses 1 and see if APC cache is still empty.

How to share APC cache between several PHP processes when running under FastCGI?

FastCGI with a PHP APC Opcode Cache

There are articles on 2bits.com:
http://groups.drupal.org/node/27174
http://2bits.com/articles/apache-fcgid-acceptable-performance-and-better-resource-utilization.html

[] Nginx - is a free, open-source, high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server. Igor Sysoev started development of Nginx in 2002, with the first public release in 2004. Nginx now hosts nearly 6.55% (13.5M) of all domains worldwide.

Nginx is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption.

[] php-fpm (FastCGI Process Manager) - is an alternative PHP FastCGI implementation with some additional features useful for sites of any size.

[] Nginx + php-fpm + apc = Awesome

[] Disable the Apache modules that you don't need.

[] APC (Alternative PHP Cache) - Alternative PHP Cache (APC) is a free, open source framework that optimizes PHP intermediate code and caches data and compiled code (opcode code) from the PHP bytecode compiler in shared memory. APC is quickly becoming the de-facto standard PHP caching mechanism as it will be included built-in to the core of PHP starting with PHP 6.

[] Memcached - Free & open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

Memcached is simple yet powerful. Its simple design promotes quick deployment, ease of development, and solves many problems facing large data caches. Its API is available for most popular languages.

[] Using MySQL with memcached
Chapter 14. High Availability and Scalability:

14.1. Using MySQL with DRBD
14.2. Using Linux HA Heartbeat
14.3. MySQL and Virtualization
14.4. Using ZFS Replication
14.5. Using MySQL with memcached
14.6. MySQL Proxy

[] HAProxy - High Performance TCP/HTTP Load Balancer

[] Varnish - an HTTP accelerator designed for content-heavy dynamic web sites. In contrast to other HTTP accelerators, many of which began life as client-side proxies or origin servers, Varnish was designed from the ground up as an HTTP accelerator.

[] MySQL memory type table (it is good and fast for the SQL select statement selects from memory type table)

[] MySQL proxy can be used to separate insert / select statements. MySQL Proxy is a simple program that sits between your client and MySQL server(s) that can monitor, analyze or transform their communication. Its flexibility allows for unlimited uses; common ones include: load balancing, failover, query analysis, query filtering, modification, and read/write (select and insert) splitting

[] Nginx - a HTTP and reverse proxy server, as well as a mail proxy server written by Igor Sysoev. It has been running for more than five years on many heavily loaded Russian sites including Rambler (RamblerMedia.com). According to Netcraft nginx served or proxied 4.70% busiest sites in April 2010.

[] LVS (Linux Virtual Server) - The Linux Virtual Server is a highly scalable and highly available server built on a cluster of real servers, with the load balancer running on the Linux operating system. The architecture of the server cluster is fully transparent to end users, and the users interact as if it were a single high-performance virtual server.

[] Apache Solr - Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.

[] Drupal - APC module - This module is only available for Drupal 7 because it has a new cache implementation. It allows using different backends for the types of caches. For example you could cache 'cache' and 'cache_bootstrap' in APC; 'cache_field' and 'cache_menu' in Memcached and store 'cache_filter' in the database.

[] Drupal - memcache module - An API for using Memcached and the PECL Memcache library with Drupal.

[] Drupal - boost module - Boost provides static page caching for Drupal enabling a very significant performance and scalability boost for sites that receive mostly anonymous traffic. Boost is very easy to install and has been throughly tested on Shared, VPS and Dedicated hosting. Apache is fully supported, with Nginx, Lighttpd and IIS 7 semi-supported. Boost will cache & gzip compress html, xml, ajax, css, & javascript. Boosts cache expiration logic is very advanced; it's fairly simple to have different cache lifetimes for different parts of your site. The built in crawler makes sure expired content is quickly regenerated for fast page loading. For shared hosting this is your best option in terms of improving performance.

[] Drupal - varnish module - This module provides integration between your Drupal site and the Varnish HTTP Accelerator, an advanced and very fast reverse-proxy system. Basically, Varnish handles serving static files and anonymous page-views for your site much faster and at higher volumes than Apache, in the neighborhood of 3000 requests per second.

[] Drupal - Cache Router module - CacheRouter is a caching system for Drupal allowing you to assign individual cache tables to specific cache technology. CacheRouter has an option to utilize the page_fast_cache part of Drupal in order to reduce the amount of resources needed for serving pages to anonymous users.

[] Drupal - Authcache - The Authcache module offers page caching for both anonymous users and logged-in authenticated users. This allows Drupal/PHP to only spend 1-2 milliseconds serving pages, greatly reducing server resources.

=============== [START] Database ================
[] MySQL - is a relational database management system (RDBMS)[1] that runs as a server providing multi-user access to a number of databases. MySQL is officially pronounced ("My S-Q-L"),[2] but is often also pronounced ("My Sequel"). It is named after developer Michael Widenius' daughter, My. The SQL phrase stands for Structured Query Language.[3]

The MySQL development project has made its source code available under the terms of the GNU General Public License, as well as under a variety of proprietary agreements. MySQL was owned and sponsored by a single for-profit firm, the Swedish company MySQL AB, now owned by Oracle Corporation.[4]

Members of the MySQL community have created several forks (variations) such as Drizzle, OurDelta, Percona Server, and MariaDB. All of these forks were in progress before the Oracle acquisition; Drizzle was announced eight months before the Sun acquisition.

[] PostgreSQL - often simply Postgres, is an object-relational database management system (ORDBMS).[4] It is released under an MIT-style license and is thus free and open source software. As with many other open-source programs, PostgreSQL is not controlled by any single company — a global community of developers and companies develops the system.

[] HBase - is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java. It is developed as part of Apache Software Foundation's Hadoop project and runs on top of HDFS (Hadoop Distributed Filesystem), providing BigTable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.

HBase features compression, in-memory operation, and Bloom filters on a per-column basis as outlined in the original BigTable paper [1]. Tables in HBase can serve as the input and output for MapReduce jobs run in Hadoop, and may be accessed through the Java API but also through REST or Thrift gateway APIs.

HBase is not a direct replacement for a classic SQL Database, although recently its performance has improved, and it is now serving several data-driven websites[2][3], including Facebook's Messaging Platform[4].

[] Apache Hadoop

[] MongoDB - (from "humongous") is an open source, scalable, high-performance, schema-free, document-oriented database written in the C++ programming language.[1]

MongoDB combines the functionality of key-value stores - which are fast and highly scalable - and traditional RDBMS systems - which provide rich queries and deep functionality. It is designed for problems that are difficult to be solved by traditional RDBMSs, for example databases spanning many servers.

The database is document-oriented so it manages collections of JSON-like documents. Many applications can, thus, model data in a more natural way, as data can be nested in complex hierarchies and still be query-able and indexable.

[] NoSQL - In computing, NoSQL is a term used to designate database management systems that differ from classic relational database management systems in some way. These data stores may not require fixed table schemas, and usually avoid join operations and typically scale horizontally. Academics and papers typically refer to these databases as structured storage,[1][2][3][4] a term that would include classic relational databases as a subset.

Notable production implementations include Google's BigTable, Amazon's Dynamo and Apache Cassandra.

[] Cassandra - is an open source distributed database management system. It is an Apache Software Foundation top-level project[1] designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. It is a NoSQL solution that was initially developed by Facebook and powers their Inbox Search feature.[2] Jeff Hammerbacher, who led the Facebook Data team at the time, has described Cassandra as a BigTable data model running on an Amazon Dynamo-like infrastructure.[3]

[] Apache CouchDB - Apache CouchDB, commonly referred to as CouchDB, is a free web scale open source document-oriented database written in the Erlang programming language. It is a NoSQL product designed for local replication and to scale vertically along a wide range of devices. CouchDB is supported by commercial enterprises CouchOne and Cloudant.

=============== [END] Database ================

Reference:
http://gala4th.blogspot.com/2010/11/nginx-vs-haproxy-vs-lvs.html

scaling drupal - an open-source infrastructure for high-traffic drupal sites

scaling drupal step four - database segmentation using mysql proxy

http://cruncht.com/89/drupal-lamp-server-tuning

http://groups.drupal.org/node/146864

No comments: