Data Exchange Mechanisms and Considerations
The advantages and disadvantages of each of these different web service API styles are beyond the scope of this discussion.
Extract, Transform, and Load (ETL)
File Transfer
Remote Procedure Call
Event Based/Brokered Messaging
Data Streaming
Considerations in Selecting a Data Exchange Approach
Data set characteristics
Data Complexity
Frequency of data update
Data set size
Data environment characteristics
Data flows and breadth of solution
Frequency of data usage
Data versions
Data security
Data transformation complexity
Connection persistence
Scope Constraints
as the constraints on quality" width="262" height="212" />
Every project is constrained in some way and selecting a data interchange mechanism is no different. At the highest level the basic ‘scope triangle’ of time, cost and quality cannot be ignored. Time is the available time to deliver the project, cost represents the amount of money or resources available and quality represents the fit-to-purpose that the project must achieve to be a success. Normally one or more of these factors is fixed and the remaining vary. For example, reducing the time to completion will affect quality and/or costs. Factors such as available technical skills, business strategies and organizational culture may also represent constraints. In addition, it is unlikely that all, or even many, of the data exchange methods discussed above will be supported in a particular case. This is particularly true in the case of software as a service (SaaS) applications where the customer has no control over the data exchange methods available in the product. However, after taking these larger considerations into account, more than one option may remain. This discussion is focused on those cases.
Organizational Considerations
Consumer characteristics
Human beings and front-facing applications
Receiving system processes
Usage by the receiving system
Summary
The following table summarizes selection criteria and associated data exchange methods.
Typical Use Cases | Method Ranking (High, Medium, Low) |
Web Service | Messaging | DB | File |
The data is required in multiple formats | H | M | M | L |
The data is used in a client front-end | H | M | M | L |
The data supports a feature | H | H | M | L |
The data is frequently requested | H | H | M | L |
The data changes frequently | H | H | M | L |
The data is used in a back-end | L | H | H | M |
The data has multiple flows and receivers | M | H | L | L |
The data is requested in multiple versions/schemas | L | L | H | L |
Assembling the data involves multiple entities and/or variable logic | L | L | H | L |
The data set is very large | L | M | H | H |
The data forms the basis of a larger platform | L | L | H | H |
The data must be human readable | L | L | L | H |
Appendix
Data Formats
Text-Based Formats
XML
- Readable and editable by developers
- Error checking by means of Schema and DTDs
- Can represent complex hierarchies of data
- Unicode gives flexibility for international operation
- Plenty of tools in all computer languages for both creation and parsing
- Support Namespace to avoid name conflicts
- Bulky text with low payload/formatting ratio
- Both creation and client-side parsing are CPU intensive
- Some common word processing characters are illegal
- Images and other binary data require extra encoding
JSON
- Readable and editable by developers, easily consumed by web browsers
- Simpler than XML
- Supported by highly developed browser toolkits such as jQuery
- Bulky text with low payload/formatting ratio, but not as bad as XML
- Client CPU time required to parse
- Not as flexible as XML for some data structures and binary data
Plain Text
- Readable and editable by developers
- Fairly compact representation for simple types
- Possible confusion introduced by punctuation in values
- Limited to very simple structures
- Is inherently 'flat' and cannot easily represent hierarchical data
Binary Based Formats
CORBA
- Language and operating system independent
- Compact data representation
- Built in mapping in Java covers almost all features
- Open-source versions are available
- Complex, difficult learning curve
- Not well supported by OS vendors
- Difficult to use if a server and/or client is behind a firewall or if network address translation is being used
Google Protocol Buffers, Avro, Thrift
- Very compact representation, approaching theoretical maximum
- Tools for many languages
- Not sensitive to version changes
- Include schemas and generated documentation
- Not readable or editable by developers
- Yet another data definition syntax to learn
Transfer Protocols
FTP (File Transfer Protocol)
FTPS (FTP over SSL)
HTTP (Hypertext Transfer Protocol)
HTTPS (HTTP over SSL)
WebSocket
SFTP (SSH File Transfer Protocol)
SCP (Secure Copy)
File sharing protocols (CIFS/SMB and NFS)
AMQP
LDAP
AS2 (Applicability Statement 2)
AFTP (Accelerated File Transfer Protocol)
APIs that are confused with protocols
1. A web service is a specific implementation of the Remote Procedure Call pattern. For purposes of this discussion, RPC is used to refer to non-web/HTTP implementations.↩
2. Localizing data within applications, especially copies of data from systems of record, creates significant data consistency and management problems. The need for large file or database transfer methods may indicate a need for a more maintainable system architecture and design.↩