软件架构的魅力2–读书笔记

涉及到方法论,我的理解就是解决工作中遇到问题的指导。

首先要看看软件架构师的工作包括哪些内容:

1 业务层次:了解问题的来龙去脉,建模分析。
2 技术层次,这个不多说,主要工作。<br>

抄一段定义:
一个软件系统的架构师是一个要担负起软件系统的定义、架构的实现、系统的实施、系统架构演化和系统演化的人,是一个要为系统整个生命周期负责的人。<br>

从中可以看出,架构师要求有广泛的技术经验、商业经验、流程经验和社会经验。

指导架构师工作有2种方式:

1 提供完整的架构框架Architecture Framework,供裁剪应用。

架构框架,不是J2EE、.NET、OSGI这类的应用开发框架,而是专门进行架构构建的框架体系。这类框架用于构建架构描述(Architecture Description),描述系统内有哪些角色、子系统或构件、流程、数据依赖,以及这些系统组成部分之间如何进行交互和相互依赖。当然会提供参考实践。

比较著名的有:RIM-OOP,Catalysis,TOGAF,ZIFA,EA,MODAF,DODAF。

推荐较好的有RIM-OOP、TOGAF。

插一句,System Architecture和System Design的区别,下面的说明个人觉得不错:
I would take it as the difference between designing a system (a collection of entities working together to accomplish a common goal, or a protocol or process for the same) to do something specific, but not necessarily paying attention to the environment that it is in, whereas architecture implies designing something that fits in with it’s surroundings, does what it is intended without offending anything else around it, and maybe even has an aspect of beauty to it.

Advertisements
Posted in 读书笔记, 技术思考 | Leave a comment

软件架构的魅力–读书笔记

比较欣赏这句话,when architecture is at its highest level of harmony, beauty is attained.

进入一个领域后,首先要了解的是专用名词。通过对名词的了解,有一个初步的印象。

那何谓架构?何谓系统架构?

架构一词来源于建筑设计领域,推而广之,就是对所研究领域内的元素及元素之间关系的一种映射(主观性的映射)的产物。

系统架构,System Architecture:系统性的进行架构,即用完整的方法论做指导进行架构。

同样的,在软件领域,也存在方法论进行指导。

Posted in 读书笔记, 技术思考 | Leave a comment

Hello world!

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!

Posted in Uncategorized | 1 Comment

迷宫算法

有时候规则的约定比复杂的代码判断起着更好的方式。一个简单的顺时针约定,比很多逻辑记录要好很多。

Posted in Uncategorized | Leave a comment

Google的一个slide

Know Your Basic Building Blocks
Core language libraries, basic data structures,

protocol buffers, GFS, BigTable,
indexing systems, MySQL, MapReduce, …

Not just their interfaces, but understand their
implementations (at least at a high level)


If you don’t know what’s going on, you can’t do
decent back-of-the-envelope calculations!

清晰了一些方向。

Posted in Uncategorized | Leave a comment

2010-08-02 工作日志

今日工作主要有两项:

1 追踪客户网站的性能问题:

其实这个工作已经进行一段时间了,今天有点眉目了。原来客户网站的表现是每日交易时间内会出现频繁宕机的情况。

一般遇到这种情况,我的工作流程一般是这样:

A 收集从Apache服务器—>WebSphere服务器—>DataBase服务器的各种数据及日志,包括CPU使用情况、内存使用情况、硬盘使用情况、系统和错误日志。

B 确定问题根源,得出问题解决办法。

这次的现象:Apache和WebSphere部署在一台机器上,DataBase是另外一台机器。Apache和WebSphere在宕机时CPU利用率很低,HTTP请求数比较高,HTTP Response Time很大;DataBase服务器CPU利用率居高不下,内存基本用完。跟踪DB,发现一些SQL语句,比较消耗CPU。分析问题从2个方面入手:

A 应用级:开始分析应用实现,首先判断由于应用的SQL执行时间过长,且消耗CPU,导致web请求无法及时处理完毕,而导致web服务器阻塞。再DBA分析,无法进行继续优化(未加索引的SQL调整加组合索引)。首先想到2个办法,1:缩减功能;2:加cache。缩减功能被客户否决,只好采用cache机制。考虑到工作量的问题,直接在页面上使用oscache做工作,而没有在服务层做cache。

B 系统级:客户的DBA发现数据库机器的虚拟内存已经超过2G,超过设定的文件大小。故对数据库做出调整。并对数据库的索引值做出优化。

经过调整后,今天监控一天,系统反应比较正常。值得一提的是,采用cacti监控系统使用情况,非常有效。

2 思考蚂蚁微股的产品设计:

主要考虑2点:

A 如何增强有趣性:交换好?还是无交换好?

B 如何增强扩展性:排名是否能达到预期的效果?恶搞是否能刺激外延?

这两点挺纠结的。

Posted in Uncategorized | 1 Comment

通读James Hamilton的On Designing and Deploying Internet-Scale Service笔记

花了周末的时间,通读了一遍《On Designing and Deploying Internet-Scale Service》,与自己多年经验相互验证,收益颇多。不多说了,下面是摘抄,*表示严重同意,?表示暂时还不太明白。原文下载在PDF

1 Expect failures.

2 Keep things simple.

3 Automate everything.

 

  • The basic design tenets and considerations we have laid out above are:

1 design for failure

2 implement redundancy and fault recovery

3 depend upon a commodity hardware slice

4 support single-version software .

5 implement multi-tenancy.

 

  • More specific best practices for designing operations-friendly services are:

1 Quick service health check

2 Develop in the full environment

3 Zero trust of underlying components:
  common techniques are to:
  I) continue to operate on cached data in read-only mode or
  II) continue to provide service to all but a tiny fraction of the user base during the short time while the service is accessing the redundant copy of the failed component.

4 Do not build the same functionality in multiple components.

5 One pod or cluster should not affect another pod or cluster.

6 Allow (rate) emergency human intervention. It’s very interesting.

7 Keep things simple and robust.

8 Enforce admission control at all levels.

9 Partition the service.
  recommend using a look-up table at the mid-tier that maps fine-grained entities, typically users, to the system where their data is managed.  

10 Understand the network design. ???

11 Analyze throughput and latency.****

12 Treat operations utilities as part of the service.

13 Understand access patterns.???
   What impacts will this feature have on the rest of the infrastructure?

14 Version everything.

15 Keep the unit/functional tests from the last release.

16 Avoid single points of failure. ****

 

  • Automatic Management and Provisioning:

1 Be restartable and redundant

2 Support geo-distribution.

3 Automatic provisioning and installation

4 Configuration and code as a unit.

5 Manage server roles or personalities rather than servers.

6 Multi-system failures are common. ****

7 Recover at the service level.

8 Never rely on local storage for non-recoverable information.

9 Keep deployment simple.

10 Fail services regularly. ****

 

  • Dependency Management

1 Expect latency.
  Ensure all interactions have appropriate timeouts. ***

2 Isolate failures. ***

3 Use shipping and proven components. ???

4 Implement inter-service monitoring and altering.

5 Dependent services require the same design point.
  Same SLA as the depending service.

6 Decouple components.

 

  • Release Cycle and Testing
      The goal is to minimize the number of engineering and operations interaction.

1 Ship often

2 Use production data to find problems.
3 Invest in engineering

4 Support version roll-back

5 Maintain forward and backward compatibility.

6 Single-server deployment.

7 Stress test for load.

8 Perform capactiy and performance testing prior to new releases.

9 Build and deploy shallowly and iteratively.

10 Test with real data. ***

11 Run system-level acceptance tests.

12 Test and develop in full environments.

 

  • Hardware Selection and Standardization

1 Use only standard SKUs.

2 Purchase full racks.

3 Write to a hardware abstraction.

4 Abstract the network and naming.

 

  • Operations and Capacity Planning

1 Make the development team responsible.

2 Soft delete only.

3 Track resource allocation.

4 Make one change at a time.

5 Make Everything Configurable. ****

 

  • Auditing, Monitoring and Alerting
    Alerting is an art.

1 Instrument everything.

2 Data is the most valuable asset.

3 Have a customer view of service.

4 Instrumentation required for production testing.

5 Latencies are the toughest problem.

6 Have sufficient production data.

7 Configurable logging.

8 Expose health information for monitoring

9 Make all reported errors actionable.****
   Give enough information to diagnose.

 

  • Graceful Degradation and Admission Control

1 Support a "big red switch".

2 Control admission.

3 Meter admission.

 

  • Customer and Press Communication Plan
    Even without a client, if users interact with the system via web pages.

Posted in Uncategorized | Leave a comment