TY - BOOK AU - Banerjee,Kyle TI - The data wrangler's handbook: simple tools for powerful results SN - 9780838919132 AV - QA76.9.D26 B36 2019eb U1 - 005.74/3 23 PY - 2019///] CY - Chicago PB - ALA Neal-Schuman KW - Database design KW - Data structures (Computer science) KW - Information retrieval KW - File conversion (Computer science) KW - Information Storage and Retrieval KW - Bases de données KW - Conception KW - Structures de données (Informatique) KW - Recherche de l'information KW - Fichiers (Informatique) KW - Conversion KW - information retrieval KW - aat KW - fast KW - Electronic books N1 - Includes bibliographical references and index; Cover -- Title Page -- Copyright Page -- Contents -- List of Figures and Tables -- Acknowledgments -- Introduction -- Chapter 1. Getting Started with the Command Line -- Finding the Command Line -- Mac -- Windows -- Meet the Command Line -- Chapter 2. Command Line Concepts -- Two Powerful Symbols -- Direct Output to a File (Greater than Symbol) -- Direct Output to Another Program (Pipe Symbol) -- Command Substitution -- Regular Expressions-The Swiss Army Knife for Data -- Literal Characters -- Special Characters -- Wildcard Characters -- Logical Operators -- Grouping -- Scripting; Chapter 3. Understanding Formats, by David Forero -- Chapter 4. Simplify Complicated Problems -- Isolating Specific Data Elements -- Converting Data into Formats That Are Easier to Work With -- Chapter 5. Delimited Text -- CSV (Comma Separated Values) -- Commas and Quotation Marks in CSV Files -- Multiline Fields in CSV Files -- Multivalued Fields in Delimited Files -- Chapter 6. XML -- So What Is XML, Really? -- What Makes XML So Useful? -- Why Is XML So Easy? -- DOM (Document Object Model) -- XPath -- XSLT (eXtensible Stylesheet Language Transformations) -- Working with Large XML Files; Working with Complex XML Files -- XmlStarlet -- Installing XmlStarlet -- Converting XML Documents -- Chapter 7. JSON (JavaScript Object Notation) -- Chapter 8. Scripting -- Variables -- Arguments -- Conditional Execution -- Loops -- Chapter 9. Solving Common Problems -- Viewing Large Files -- Locating Files That Contain Particular Data -- Finding Files with Specific Characteristics -- Working with Internal Metadata -- Working with APIs -- Combining Data from Different Sources -- Other Tasks -- Chapter 10. Conclusions -- One-Line Wonders -- Locating, Viewing, and Performing Basic File Operations; Combine Information from Multiple Files into a Single File -- Combine Three Files, Each Consisting of a Single Column, into a Three-Column Table -- Extract 1,000 Random Lines or Records from a File -- Find Files with Specific Characteristics -- Find All Lines in All Files in the Current Directory as Well as All Subdirectories Containing a Regular Expression -- Identify All Files in Current Directories and Subdirectories That Contain a Value -- List All Files in Current Directory and Subdirectories over a 100 MB in Order of Decreasing Size; List the Names, Pixel Dimensions, and File Sizes of All Files in the Current Directory and Subdirectories in Tab Delimited Format -- Print Line Number of File That Match Occurred On -- Split Large Files into Smaller Chunks with Each File Breaking on a Line -- View 200 Characters Starting at Position 385621 in a File -- View Lines 4369-4374 of a File -- Retrieving and Sending Information over a Network -- Retrieve a Document from the Web and Send It to a File -- Send an XML Document to an API Requiring HTTP Authentication -- Sorting, Counting, Deduplication, and File Comparison N2 - "Data manipulation and analysis are far easier than you might imagine - in fact, using tools that come standard with your desktop computer, you can learn how to extract, manipulate, and analyze data (and metadata) of any size and complexity. In this handbook, data wizard Banerjee will familiarize you with easily digestible but powerful concepts that will enable you to feel confident working with data. With his expert guidance, you'll learn how to use a single-word command to sort files of any size by any criteria, identify duplicates, and perform numerous other common library tasks; understand data formats, delimited text and CSV files, XML, JSON, scripting, and other key components of data; undertake more sophisticated tasks such as comparing files, converting data from one format to another, reformatting values, combining data from multiple files, and communicating with APIs (Application Programming Interfaces); and save time and stress through simple techniques for transforming text, recognizing symbols that perform important tasks, a Regular Expression cheat sheet, a glossary, and other tools"-- UR - https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=2405563 ER -