Principles and Practice of Big Data

Námskeið
- T-764 Big Data Management.
Ensk lýsing:
Principles and Practice of Big Data: Preparing, Sharing, and Analyzing Complex Information, Second Edition updates and expands on the first edition, bringing a set of techniques and algorithms that are tailored to Big Data projects. The book stresses the point that most data analyses conducted on large, complex data sets can be achieved without the use of specialized suites of software (e. g. , Hadoop), and without expensive hardware (e.
g. , supercomputers). The core of every algorithm described in the book can be implemented in a few lines of code using just about any popular programming language (Python snippets are provided). Through the use of new multiple examples, this edition demonstrates that if we understand our data, and if we know how to ask the right questions, we can learn a great deal from large and complex data collections.
The book will assist students and professionals from all scientific backgrounds who are interested in stepping outside the traditional boundaries of their chosen academic disciplines. Presents new methodologies that are widely applicable to just about any project involving large and complex datasets Offers readers informative new case studies across a range scientific and engineering disciplines Provides insights into semantics, identification, de-identification, vulnerabilities and regulatory/legal issues Utilizes a combination of pseudocode and very short snippets of Python code to show readers how they may develop their own projects without downloading or learning new software.
Lýsing:
Principles and Practice of Big Data: Preparing, Sharing, and Analyzing Complex Information, Second Edition updates and expands on the first edition, bringing a set of techniques and algorithms that are tailored to Big Data projects. The book stresses the point that most data analyses conducted on large, complex data sets can be achieved without the use of specialized suites of software (e. g. , Hadoop), and without expensive hardware (e.
g. , supercomputers). The core of every algorithm described in the book can be implemented in a few lines of code using just about any popular programming language (Python snippets are provided). Through the use of new multiple examples, this edition demonstrates that if we understand our data, and if we know how to ask the right questions, we can learn a great deal from large and complex data collections.
The book will assist students and professionals from all scientific backgrounds who are interested in stepping outside the traditional boundaries of their chosen academic disciplines. Presents new methodologies that are widely applicable to just about any project involving large and complex datasets Offers readers informative new case studies across a range scientific and engineering disciplines Provides insights into semantics, identification, de-identification, vulnerabilities and regulatory/legal issues Utilizes a combination of pseudocode and very short snippets of Python code to show readers how they may develop their own projects without downloading or learning new software.
Annað
- Höfundur: Jules J. Berman
- Útgáfa:2
- Útgáfudagur: 2018-07-23
- Engar takmarkanir á útprentun
- Engar takmarkanir afritun
- Format:ePub
- ISBN 13: 9780128156100
- Print ISBN: 9780128156094
- ISBN 10: 0128156104
Efnisyfirlit
- Cover image
- Title page
- Table of Contents
- Copyright
- Other Books by Jules J. Berman
- Dedication
- About the Author
- Author's Preface to Second Edition
- Abstract
- Author's Preface to First Edition
- 1: Introduction
- Abstract
- Section 1.1. Definition of Big Data
- Section 1.2. Big Data Versus Small Data
- Section 1.3. Whence Comest Big Data?
- Section 1.4. The Most Common Purpose of Big Data Is to Produce Small Data
- Section 1.5. Big Data Sits at the Center of the Research Universe
- 2: Providing Structure to Unstructured Data
- Abstract
- Section 2.1. Nearly All Data Is Unstructured and Unusable in Its Raw Form
- Section 2.2. Concordances
- Section 2.3. Term Extraction
- Section 2.4. Indexing
- Section 2.5. Autocoding
- Section 2.6. Case Study: Instantly Finding the Precise Location of Any Atom in the Universe (Some Assembly Required)
- Section 2.7. Case Study (Advanced): A Complete Autocoder (in 12 Lines of Python Code)
- Section 2.8. Case Study: Concordances as Transformations of Text
- Section 2.9. Case Study (Advanced): Burrows Wheeler Transform (BWT)
- 3: Identification, Deidentification, and Reidentification
- Abstract
- Section 3.1. What Are Identifiers?
- Section 3.2. Difference Between an Identifier and an Identifier System
- Section 3.3. Generating Unique Identifiers
- Section 3.4. Really Bad Identifier Methods
- Section 3.5. Registering Unique Object Identifiers
- Section 3.6. Deidentification and Reidentification
- Section 3.7. Case Study: Data Scrubbing
- Section 3.8. Case Study (Advanced): Identifiers in Image Headers
- Section 3.9. Case Study: One-Way Hashes
- 4: Metadata, Semantics, and Triples
- Abstract
- Section 4.1. Metadata
- Section 4.2. eXtensible Markup Language
- Section 4.3. Semantics and Triples
- Section 4.4. Namespaces
- Section 4.5. Case Study: A Syntax for Triples
- Section 4.6. Case Study: Dublin Core
- 5: Classifications and Ontologies
- Abstract
- Section 5.1. It's All About Object Relationships
- Section 5.2. Classifications, the Simplest of Ontologies
- Section 5.3. Ontologies, Classes With Multiple Parents
- Section 5.4. Choosing a Class Model
- Section 5.5. Class Blending
- Section 5.6. Common Pitfalls in Ontology Development
- Section 5.7. Case Study: An Upper Level Ontology
- Section 5.8. Case Study (Advanced): Paradoxes
- Section 5.9. Case Study (Advanced): RDF Schemas and Class Properties
- Section 5.10. Case Study (Advanced): Visualizing Class Relationships
- 6: Introspection
- Abstract
- Section 6.1. Knowledge of Self
- Section 6.2. Data Objects: The Essential Ingredient of Every Big Data Collection
- Section 6.3. How Big Data Uses Introspection
- Section 6.4. Case Study: Time Stamping Data
- Section 6.5. Case Study: A Visit to the TripleStore
- Section 6.6. Case Study (Advanced): Proof That Big Data Must Be Object-Oriented
- 7: Standards and Data Integration
- Abstract
- Section 7.1. Standards
- Section 7.2. Specifications Versus Standards
- Section 7.3. Versioning
- Section 7.4. Compliance Issues
- Section 7.5. Case Study: Standardizing the Chocolate Teapot
- 8: Immutability and Immortality
- Abstract
- Section 8.1. The Importance of Data That Cannot Change
- Section 8.2. Immutability and Identifiers
- Section 8.3. Coping With the Data That Data Creates
- Section 8.4. Reconciling Identifiers Across Institutions
- Section 8.5. Case Study: The Trusted Timestamp
- Section 8.6. Case Study: Blockchains and Distributed Ledgers
- Section 8.7. Case Study (Advanced): Zero-Knowledge Reconciliation
- 9: Assessing the Adequacy of a Big Data Resource
- Abstract
- Section 9.1. Looking at the Data
- Section 9.2. The Minimal Necessary Properties of Big Data
- Section 9.3. Data That Comes With Conditions
- Section 9.4. Case Study: Utilities for Viewing and Searching Large Files
- Section 9.5. Case Study: Flattened Data
- 10: Measurement
- Abstract
- Section 10.1. Accuracy and Precision
- Section 10.2. Data Range
- Section 10.3. Counting
- Section 10.4. Normalizing and Transforming Your Data
- Section 10.5. Reducing Your Data
- Section 10.6. Understanding Your Control
- Section 10.7. Statistical Significance Without Practical Significance
- Section 10.8. Case Study: Gene Counting
- Section 10.9. Case Study: Early Biometrics, and the Significance of Narrow Data Ranges
- 11: Indispensable Tips for Fast and Simple Big Data Analysis
- Abstract
- Section 11.1. Speed and Scalability
- Section 11.2. Fast Operations, Suitable for Big Data, That Every Computer Supports
- Section 11.3. The Dot Product, a Simple and Fast Correlation Method
- Section 11.4. Clustering
- Section 11.5. Methods for Data Persistence (Without Using a Database)
- Section 11.6. Case Study: Climbing a Classification
- Section 11.7. Case Study (Advanced): A Database Example
- Section 11.8. Case Study (Advanced): NoSQL
- 12: Finding the Clues in Large Collections of Data
- Abstract
- Section 12.1. Denominators
- Section 12.2. Word Frequency Distributions
- Section 12.3. Outliers and Anomalies
- Section 12.4. Back-of-Envelope Analyses
- Section 12.5. Case Study: Predicting User Preferences
- Section 12.6. Case Study: Multimodality in Population Data
- Section 12.7. Case Study: Big and Small Black Holes
- 13: Using Random Numbers to Knock Your Big Data Analytic Problems Down to Size
- Abstract
- Section 13.1. The Remarkable Utility of (Pseudo)Random Numbers
- Section 13.2. Repeated Sampling
- Section 13.3. Monte Carlo Simulations
- Section 13.4. Case Study: Proving the Central Limit Theorem
- Section 13.5. Case Study: Frequency of Unlikely String of Occurrences
- Section 13.6. Case Study: The Infamous Birthday Problem
- Section 13.7. Case Study (Advanced): The Monty Hall Problem
- Section 13.8. Case Study (Advanced): A Bayesian Analysis
- 14: Special Considerations in Big Data Analysis
- Abstract
- Section 14.1. Theory in Search of Data
- Section 14.2. Data in Search of Theory
- Section 14.3. Bigness Biases
- Section 14.4. Data Subsets in Big Data: Neither Additive Nor Transitive
- Section 14.5. Additional Big Data Pitfalls
- Section 14.6. Case Study (Advanced): Curse of Dimensionality
- 15: Big Data Failures and How to Avoid (Some of) Them
- Abstract
- Section 15.1. Failure Is Common
- Section 15.2. Failed Standards
- Section 15.3. Blaming Complexity
- Section 15.4. An Approach to Big Data That May Work for You
- Section 15.5. After Failure
- Section 15.6. Case Study: Cancer Biomedical Informatics Grid, a Bridge Too Far
- Section 15.7. Case Study: The Gaussian Copula Function
- 16: Data Reanalysis: Much More Important Than Analysis
- Abstract
- Section 16.1. First Analysis (Nearly) Always Wrong
- Section 16.2. Why Reanalysis Is More Important Than Analysis
- Section 16.3. Case Study: Reanalysis of Old JADE Collider Data
- Section 16.4. Case Study: Vindication Through Reanalysis
- Section 16.5. Case Study: Finding New Planets From Old Data
- 17: Repurposing Big Data
- Abstract
- Section 17.1. What Is Data Repurposing?
- Section 17.2. Dark Data, Abandoned Data, and Legacy Data
- Section 17.3. Case Study: From Postal Code to Demographic Keystone
- Section 17.4. Case Study: Scientific Inferencing From a Database of Genetic Sequences
- Section 17.5. Case Study: Linking Global Warming to High-Intensity Hurricanes
- Section 17.6. Case Study: Inferring Climate Trends With Geologic Data
- Section 17.7. Case Study: Lunar Orbiter Image Recovery Project
- 18: Data Sharing and Data Security
- Abstract
- Section 18.1. What Is Data Sharing, and Why Don't We Do More of It?
- Section 18.2. Common Complaints
- Section 18.3. Data Security and Cryptographic Protocols
- Section 18.4. Case Study: Life on Mars
- Section 18.5. Case Study: Personal Identifiers
- 19: Legalities
- Abstract
- Section 19.1. Responsibility for the Accuracy and Legitimacy of Data
- Section 19.2. Rights to Create, Use, and Share the Resource
- Section 19.3. Copyright and Patent Infringements Incurred by Using Standards
- Section 19.4. Protections for Individuals
- Section 19.5. Consent
- Section 19.6. Unconsented Data
- Section 19.7. Privacy Policies
- Section 19.8. Case Study: Timely Access to Big Data
- Section 19.9. Case Study: The Havasupai Story
- 20: Societal Issues
- Abstract
- Section 20.1. How Big Data Is Perceived by the Public
- Section 20.2. Reducing Costs and Increasing Productivity With Big Data
- Section 20.3. Public Mistrust
- Section 20.4. Saving Us From Ourselves
- Section 20.5. Who Is Big Data?
- Section 20.6. Hubris and Hyperbole
- Section 20.7. Case Study: The Citizen Scientists
- Section 20.8. Case Study: 1984, by George Orwell
- Index
UM RAFBÆKUR Á HEIMKAUP.IS
Bókahillan þín er þitt svæði og þar eru bækurnar þínar geymdar. Þú kemst í bókahilluna þína hvar og hvenær sem er í tölvu eða snjalltæki. Einfalt og þægilegt!Rafbók til eignar
Rafbók til eignar þarf að hlaða niður á þau tæki sem þú vilt nota innan eins árs frá því bókin er keypt.
Þú kemst í bækurnar hvar sem er
Þú getur nálgast allar raf(skóla)bækurnar þínar á einu augabragði, hvar og hvenær sem er í bókahillunni þinni. Engin taska, enginn kyndill og ekkert vesen (hvað þá yfirvigt).
Auðvelt að fletta og leita
Þú getur flakkað milli síðna og kafla eins og þér hentar best og farið beint í ákveðna kafla úr efnisyfirlitinu. Í leitinni finnur þú orð, kafla eða síður í einum smelli.
Glósur og yfirstrikanir
Þú getur auðkennt textabrot með mismunandi litum og skrifað glósur að vild í rafbókina. Þú getur jafnvel séð glósur og yfirstrikanir hjá bekkjarsystkinum og kennara ef þeir leyfa það. Allt á einum stað.
Hvað viltu sjá? / Þú ræður hvernig síðan lítur út
Þú lagar síðuna að þínum þörfum. Stækkaðu eða minnkaðu myndir og texta með multi-level zoom til að sjá síðuna eins og þér hentar best í þínu námi.
Fleiri góðir kostir
- Þú getur prentað síður úr bókinni (innan þeirra marka sem útgefandinn setur)
- Möguleiki á tengingu við annað stafrænt og gagnvirkt efni, svo sem myndbönd eða spurningar úr efninu
- Auðvelt að afrita og líma efni/texta fyrir t.d. heimaverkefni eða ritgerðir
- Styður tækni sem hjálpar nemendum með sjón- eða heyrnarskerðingu
- Gerð : 208
- Höfundur : Berman, Jules J. , Jules J Berman , Jules J. Berman
- Útgáfuár : 2013
- Leyfi : 380