How To Download Data Analysis For Mac Numbers
Updated 11/9/2020
Corolla Site
- How To Download Data Analysis For Mac Numbers Pdf
- How To Download Data Analysis For Mac Numbers Using
- How To Download Data Analysis For Mac Numbers Free
- How To Download Data Analysis For Mac Numbers 7
What about cryptographic signing and error messages when you try to install free statistical software for Macs? See our “signing page.” .. and does this work on Mojave? Is it signed and 64-bit?
Sponsor this site; for $25/month ($225/year) you can remove all other ads from MacStats.org!
Meet the free SPSS clones
I have taught statistics using JASP, Jamovi, and PSPP. Each has advantages and disadvantages, and there is nothing stopping you from using all three depending on what you are trying to do. Ironically, each one has a much faster user interface than SPSS—and all import and export SPSS .sav and syntax files.
JASP is a fork of (it was originally based on) Jamovi; both are still under active development, which have fairly similar user interfaces, and both saved a good deal of time and trouble by not reinventing the wheel—they are essentially user interfaces for another statistics program, the hard-to-learn-and-use R.
JASP | Jamovi | PSPP | |
Regression | Stepwise, forward, backward | Enter (supports multi-step) | Enter (one step) |
Missing values | Program-wide only | By variable | By variable |
Statistics engine | R | R | PSPP |
StatPlus:mac allows Mac users to perform all forms of data analysis from the very basics to complex analysis, including non-parametric and regression analysis, survival and time series analysis, and a wide variety of other methods. Version 7 includes standalone spreadsheet and can be used without Microsoft Excel or Apple Numbers installed. Apple Numbers is very easy to use compared to most other free statistical analysis software and uses interactive charts and graphs to help you interpret data. A bit like with Wizard for Mac, you can bring stats to life with animated charts which you can control by dragging a slider to highlight important changes in numbers.
The programs have spreadsheet-like data editors, but it's best to prepare information for them somewhere else; they let you computer variables, but in a clunky and hard to use way. Importing variable labels and missing values from SPSS files sometimes fails (I've only seen the missing values problem on Windows), a major drawback to programs that will read SPSS files otherwise.
JASP and Jamovi | PSPP | |
---|---|---|
t-tests | Shared variance only | Shared and unique variance |
Cost | Free | Free |
Output | Copies as tables | Copies as plain text |
More modules? | Yes | No |
Windows | One triple-pane window | Three windows |
Syntax | No | Yes, SPSS |
Contextual help | Yes, nicely integrated | No |
Can log commands to a file | Yes (can’t easily replay) | In theory/not working |
Mac open/save/print boxes | No | No |
Can use as SPV file viewer | Yes |
JASP and Jamovi share lightning-fast speed; a wide range of statistics, with extra plugins on Jamovi; and easy installation on Macs, Windows, and Linux. Their basic interface has an Office 365-style open/save/print/export tab; options on the left, output on the right layout; instant changes to the output if you change the input; and export of both data and output, as desired.
There’s a third SPSS clone, one which keeps most of the user interface from a relatively ancient version of SPSS: PSPP. At the moment there are some nasty bugs, but overall it might be easier for many people to use than JASP and Jamovi. It's easier to master if you're used to SPSS, but development has been very slow and JASP and Jamovi may be better options for that reason—unless you do a lot of computes and recodes and other data manipulation, or do a lot of t-tests. Then PSPP is the best choice.
The SPSS clones in more detail
PSPP: First clone of SPSS, with many bugs stomped out
Current Version: 1.41; Mac up to 1.4
Listing updated 12/11/2020
Last known software update: August 2020 (Mac version)
Note: Catalina version stable; unsigned software; MacPorts, Homebrew versions as well as binary
PSPP is a free SPSS clone with a Mac version you can download from this site (it’s unsigned). It is also in MacPorts, but that's another level of effort. The pre-compiled Mac version is under 60 MB, while some other free software can take half a gigabyte; it loads almost instantly. SPSS is far slower in both calculations and launching. The user interface is nice and fast.
PSPP is aimed at social scientists, business people, and students, with a convenient, easy to learn interface. It is not quite as easy to install as it could be, unless you run Linux, which is its ideal environment (PSPPire).
The interface is similar to SPSS, though there are some oddities including having menus in the windows and not in the menubar; and using its own version of the open/save dialogue box. It includes common folders in the open/save box, including Desktop, Home, and disk root, but lacks custom folders (including OneDrive and Dropbox) you may have in your Finder sidebar. (Version 1.4 fixed problems with crashing on use of arrow keys in the open/save dialogue and autorecode. Mojave-specific bugs: “Recently used files” do not work, and the program still can't find .sav files unless you specify 'all files.' Oddly enough, these are not issues in Catalina.)
PSPP imports SPSS data files, long variable names, and variable and value labels. Common options are included in some dialogue boxes without the need to dig deeper. Development seemed to accelerate in the last few years, though it's not moving as quickly as JASP and Jamovi are.
While you can copy from the output window, you have to copy from the left-hand contents, not from the main pane. The output window yields plain-text, delimited by spaces and pipes, just as SPSS 4 did. That’s not ideal for importing to spreadsheets or word processors, unless you’re really, really good at using BBEdit. There’s also no way to clear anything from the output window; and you have to use control keys instead of command keys.
The capabilities are impressive, including graphing, data transformation, crosstabs, tables, various t-tests, ANOVA, regressions, factor analysis, ROC curves, and nonparametric tests. It’s a fine way to avoid spending thousands of dollars on the big cheese. A great deal of work has gone into the analyses themselves, and the routines the program does run are well fleshed out.
The user interface can be awkward, but it’s fast both in the user interface and in the calculations; while on SPSS it takes a long time for windows to form and disappear.
Oh, and one more thing: you can get rid of the slow, buggy 740 megabyte SPSS “SmartViewer.” PSPP will happily open SPV files with the original formatting. It's amazing.
JASP: the first really good effort at making R more accessible
Current Version: 0.13.1
Listing updated: July 2020; program updated in 2020
Not signed by Apple (you may see a warning)
JASP was created as “a low fat alternative to SPSS, a delicious alternative to R,” and comes out of the University of Amsterdam (its 650MB weight is below SPSS’ gigabyte-plus).
JASP uses the native open/save dialogue box, albeit with a weird Microsoft Office-style setup requiring more than one click; and JASP is easy to install. It would be very nice if it was signed.
The software looks and feels like SPSS to a degree; it feels almost as native a SPSS. Calculations and screen drawing are far, far, far faster than in “real SPSS” — when you select the tests, they might actually be pumped out before your finger is fully off the mouse. Stepwise regression is supported (unlike Jamovi). However, when you do t-tests, if equal variances are not present, it only prints out a warning, rather than using the alternative method of calculating t.
We loaded our test file instantly — and ran descriptives instantly. Survey researchers will be happy to know they can assign value labels — and unhappy to know they must be done variable by variable, without syntax. The labels are retroactively applied to whatever is in the output window, very rapidly. Unfortunately, too, variable labels are not supported (though value labels are)—nor is there a clear way to compute new variables. Presumably one has to export the data, make the changes, and bring it back again. You can't mark missing values variable-by-variable; missing values are applied to the entire dataset, which is a bit nuts.
Oddly, t-tests in both JASP and Jamovi are done only with the assumption of shared variance, and a warning to tell you if that assumption is violated.
JASP is still being developed fairly quickly; but the lack of variable labels is a major drawback from PSPP. The clever user interface, allowing users to go back and change things in a past run simply by clicking on it in the output pane, is pretty cool, though (and shared with Jamovi); and the speed is terrific, if not quite at Stata levels. Pretty much everything is instant, while on SPSS it takes a long time for windows to form and disappear.
JASP’s advantage over Jamovi is that it supports forward, backward, and stepwise regression, while Jamovi only supports 'Enter.' The reason is ideology, so we don’t expect that to change. There is a great deal of documentation at the site Jasp for Nonprofits which sadly has not been updated since 2016, and the newish (2019) book Learning Statistics with JASP. There is also a new Machine Learning module with 13 “analyses that can be used for supervised and unsupervised learning.”
Dive more deeply into JASP (full MacStats review).
Jamovi: deceptively powerful
Current Version: 1.25
Listing updated: 7/2020; program updated 2020
Cryptographically signed by Apple
Jamovi: A free, open source package, built on top of an R foundation (Thanks, Dr. Kim-Oliver Tietze). Don’t let that put you off: Jamovi uses a simple spreadsheet interface with full graphics, and while it allows you to use syntax, you can also use menus. You can edit via spreadsheet; and your data, analyses, and options are saved in a single file, so others can reproduce your work. A large number of analyses are easy to find, or you can use R syntax.
The results are attractive (see above), with menus that will be familiar to any SPSS users — and with many options. Copying and pasting output is cleverly done; right-click on a section of output, and you can paste it into Word as a nicely formatted table. Paste into BBEdit, and it will be plain-text, formatted with spaces. Plots can also be copied and pasted, but seem to be limited to screen resolution; there are three built in plot themes, including an SPSS-clone one.
A syntax mode shows the generated R syntax for each menu command, helping you to learn R syntax or make scripts to reproduce the same actions over and over, .. except for importing data. Data can be imported in numerous ways, including formatted SPSS files and, according to the programmers, SAS and Stata files. When we imported an SPSS file, value labels came through, but it does not support variable labels at all. Likewise, it did not export variable labels consistently. Export from Jamovi to SPSS resulted in errors on some data files as the number of characters in some fields was not correctly marked.
Jamovi is fairly fast, but (like PSPP) doesn’t fully use the Mac interface; pretty much everything is instant, while on SPSS it takes a long time for windows to form and disappear.
Jamovi’s menus are kept within its own window instead of at the top of the screen, and the open/save dialogue box is very different, though it does show shortcuts for the documents, downloads, desktop, and home folders (it also has the odd new Microsoft approach to open/save/print, creating a whole new window/interface for it). You can, however, drag and drop data files onto it — saving time.
One downside: for ideological reasons, you only get Enter for linear regression. Now, though, it allows you to do multiple blocks, so you can still do sensible multiple regressions. Also, as with JASP, it will do t-tests only one way, assuming equal variances, giving you a footnote to tell you if the assumption has been violated.
Newer versions of Jamovi support having different missing values for each variable, an advantage over JASP.
Developer Jonathon Love pointed us to the Jamovi library of extra procedures, which is expanding fairly rapidly. A long, well-illustrated Jamovi blog post also goes over the fine graphics capabilities within Jamovi, which PSPP can only dream of.
The program is almost 700 megabytes in size, due largely to the integrated software — R, Electron, Mantle, Python, and ReactiveCoca. Accuracy is pretty much assured by the R underpinnings.
Dive more deeply into Jamovi (full MacStats review).
Free and promising general statistics software (other than the SPSS clones)
Past 4 (PAleontological STatistics): an absurdly wide-ranging, easy to use package
Current Version: 4.03
Listing updated: 8/2020 (program updated July 2020)
64-bit and Catalina capable
Not signed by Apple (you may see a warning)
“Past is free software for scientific data analysis, with functions for data manipulation, plotting, univariate and multivariate statistics, ecological analysis, time series and spatial analysis, morphometrics and stratigraphy.” That said, Dennis Helsel wrote, “While its name shows its origin (Paleontology), it is a full-fledged stat package which includes multivariate and permutation tests, with a nice interface.” There is good support for geographical and map-based statistics.
When Dennis says “full-fledged,” he isn’t kidding — the range of this software is stunning. Yet, the download is a mere 10 MB — far, far, far less than many others. What’s more, every new version brings a wide range of new features.
Our test file imported in less than a second, but be warned that import formats are limited and exclude SPSS files; some rather esoteric formats are accepted, though, and you can copy and paste from Excel (with caution). Summary statistics came in a fraction of a second on a laptop. Our survey file never caused more than a slight pause. In the past, very large files choked the software, but we haven't tested version 4 yet.
PDF manual. Dive more deeply into PAST (full MacStats review).
Other free general statistics software
SageMath
64-bit compatible
Current Version: 8.7
Listing updated: 4-1-19
Size: 3.5 GB (yes, GB)
SageMath is not specifically for statistics; it’s general math software, but it has the ability to do numerous statistical processes including graphing/plotting. It can be used for just about any type of math, and can be used either with the command line or or from a web browser. You can install it onto a server if you want, and create embedded graphics, typset-style math expressions, and more; it also includes sharing. The program was designed for both education and research. It is not a typical Mac program; it has a command line element and is accessed from browsers.
SageMath was built atop existing packages including NumPy, SciPy, matplotlib, Sympy, Maxima, GAP, FLINT, and R.
Regress+
Current Version: 2.8 (updated May 2019; prior version was dated May 2017)
Listing updated: August 2019
64-bit, signed, works well in Mojave
Michael McLaughlin’s Regress+ is a free package that includes regression, stochastic modeling, bootstrapping and robust goodness of fit measures. The software and a tutorial are available at the Regress+ web site. Older versions are still available for older operation systems, while version 2.5 is available for OS X and 9.2.
The program is accompanied by full documentation in PDF form which doubles as a statistics reference guide.
Regress+ 2.7, née Regress+ 3.0, was a complete rewrite; it added data modeling (equations and distributions), extensive documentation, and publication quality graphics. Regress+ 2.8 was a substantial upgrade.
This program appears to cover every aspect of regression you can think of. It's graphically oriented but has strong statistics. The code is “more than 100 times faster than before [2.7].”
SOFA Statistics
Version 1.46; Listing updated August 2019; Code updated 11/2017
Windows, Linux versions at version 1.52 as of 7/2019
Not signed; 64-bit
SOFA Statistics (Statistics Open For All) emphasizes ease of use, discoverability, and clean reporting. It can connect directly to database sources, or use data brought in from spreadsheets. The usual statistical processes are available, including one-way ANOVA, t-tests, signed ranks, chi-square, and R; nested tables can be produced with row and column percentages, totals, standard deviations, means, medians, and sums.
SOFA Statistics is written in Python, using a wxPython widget toolkit. Statistics come via the Scipy stats module. Analysis and reporting can be automated using Python scripts, either exported from SOFA or written by hand.
Data can be brought in from Google spreadsheets and CSV files. Dynamic charts use html, SVG, and Javascript. This project was under rapid development for a while, but updates have slowed down. The last blog entry was in January 2017, and there were just two in 2016, following a fairly busy 2009-2015. I was unable to open the latest SOFA in High Sierra in November 2018; the program opened and immediately crashed. Note: Same in July 2019. However, Windows and Linux versions are still good. Preparing to move to dead software page.
Statistics101
Configurations: Requires Java; should work on Intel and PPC Macs
Current Version: 4.9
Listing updated: 8/2019
Software updated: 1/30/2019
Not signed
How To Download Data Analysis For Mac Numbers Pdf
Statistics101 is giftware to help teach probability and statistics the easy way—by simulation. “Gain deeper understanding of traditional statistics concepts and methods. Increase your awareness of the role of variability in probability and statistics. Learn and apply simple to very sophisticated statistical techniques without tables or complicated formulas.” Interprets and executes the simple “Resampling Stats” programming language. The original Resampling Stats language and computer program were developed by Dr. Julian Simon and Peter Bruce to teach statistics.
A somewhat steeper learning curve or tougher install
Salstat
Python software / Listing updated: 10/19/2018 / Software updated 2014
Last news on the web site is from 2014
Salstat dates back to the early 2000s and runs on Python; installing the free version on the Mac may require quite a bit of library-and-Python downloading, but a paid version makes everything easy. There is a reward to the work of installation, though, in a free program which makes highly presentable graphics, is relatively easy to use, provides a great deal of descriptive statistics with parametric and nonparametric tests, shows its own source code, does crosstabs, and “charts, imports CSV, HTML, XML, Excel, LibreOffice and SAS file formats, and can even scrape tables of data from web pages.” The source code listing claims a last-update date of 2014.
A post in April 2018 claimed the company was creating a new version of Salstat.
R (CRAN) / “R for Mac OS X” / R.App and R GUI
Current Version: 3.6; under active development
Signed, 64-bit, requires XQuartz
Listing updated 8/2019
This is an exceedingly flexible program, with a large number of libraries and built in routines, and the ability to run many S or S-Plus programs. R loads and runs quickly but has a steep learning curve.
R programs and algorithms are distributed by the Comprehensive R Archive Network (CRAN). A simple and somewhat frustrating graphic user interface is included for Mac users; R Commander can be installed using the built-in package installer, which can also install file import features (which aren't installed by default). R Commander is an X11 program, which means it uses an alien interface and has odd open/save dialogues, but if you get past that it offers menu driven commands not dissimilar from, say, SPSS, just a lot more awkward to use, and without an output or data window.
There are now numerous front ends for R, several of which are mentioned earlier on this page.
R has a massive range of tests, PDF and PostScript output, a function to expand zip archives, and numerous other unexpected features. For much more information about R, including advantages, drawbacks, resources, and tips, see our incredibly outdated R statistics software for the Mac page.
Current Version: 1.1x
Listing updated: 4/2018; program updated 3/2018
Signed by Apple
R Studio is commercial open-source software, designed for creating and managing R applications rather than, say, doing exploratory research or testing the odd hypothesis. With frills, it can get expensive, but without frills, it’s free. The Mac version seems to be developed at the same time as Linux and Windows versions. It’s a bit of a porker (500MB plus R itself at around 130MB) and requires a separate R download; R itself is updated regularly and has a signed Mac package.
When you first load R Studio, it tells you to go back and install R. Once you've done that and restarted, it finds R easily enough, and presents you with an integrated development environment (IDE). If you try to do something, such as importing SPSS data, that isn’t possible without further downloads, it automatically connects to the Internet and installs whatever you need. The user interface is Mac-standard in most ways — you get a menu at the top of the screen (as well as menus in the window itself), and the open/save dialogues are thankfully quite normal.
Though you can manage your R installation from R Studio, it’s a tool for dealing directly with syntax, and for managing projects; it’s not a beginner’s tool (as, say, Jamovi can be). R Studio never claims to be anything but an IDE, with many options and good operating-system integration.
Specialty tools
MacMCMC
Current Version: 1.0
Listing updated: 2/2019 (program updated 2/2019)
Signed, 64-bit; good for El Capitan through Mojave
Currently just 4.6 MB, no license; intended to become open source at a later date
From the writer of Regress+ comes a free, powerful program to analyze any kind of data. MacMCMC is part of a two-part set—the other part being a free ebook. Data can be imported from plain text (UTF-8). There are 27 built-in distributions, including 16 continuous, 8 discrete, and three homogeneous mixtures — Normal, Bivariate Normal, and Poisson; users can also define their own distributions. There are 15 built-in functions. Reports include MAP, mean, median, mode, and Gelman-Rubin; credible intervals; trace; plots of marginals; and trace comparison for selected chains. The program has other features, described on its web site, along with a sample input, data, model, and output.
Advantages of MacMCMC, in addition to its price, include being a complete standalone Mac program (hence its small size and fast operation); 100% Bayesian inference; parallel processing; and access to low-level options. Users can check for updates from a dropdown menu. The basic method of using the program is to set up the model via a simple text format, easily figured out from the examples or the ebook; load data (in ASCII format); run Compile, run Setup, change any parameters desired, and then run. That yields a plain-text report and a graph which can be adjusted as needed.
G*Power
Current Version: 3.194 / requires OS 10.7-10.13
Older versions: 680x0; PowerPC; OS X (Universal Binary); Windows and DOS
Listing updated: 8/2019 (program updated Feb, 2019)
Signed, 64-bit; no mention of Mojave
G*Power was developed by Axel Buchner to provide power analyses for the most common statistical tests in behavioral research: t-tests, F-tests (including ANOVA, regression, etc.), and Chi-squared tests. G*Power computes power values for sample sizes, effect sizes, and alpha levels; sample sizes for given effect sizes, alpha levels, and power values; and alpha and beta values for given sample sizes, effect sizes, and beta/alpha ratios. It is a remarkably small program, just over 2 MB in size. Updates (for both Mac and Windows) are slow, with nothing but bug-fixes since March 10, 2014. Version 3.1 itself dates back to 2009, though there were numerous improvements from 2009 to 2014.
gretl
gretl can do general statistical routines and many specialized ones; it is in our “special purpose and general math programs” page.
GMT (“The Generic Mapping Tools”)
Current version: 5.45
Program updated: Jan 4, 2019
Listing updated: 9-2-2019
Command-line tools that run on Unix-like systems, including Mac OS X. See https://github.com/GenericMappingTools/gmt for details. Many of the main developers, including Paul Wessel, use Mac OS X. From their site, GMT is..
.. about 80 command-line tools for manipulating geographic and Cartesian data sets (including filtering, trend fitting, gridding, projecting, etc.) and producing PostScript illustrations ranging from simple x–y plots via contour maps to artificially illuminated surfaces and 3D perspective views; the GMT supplements add another 40 more specialized and discipline-specific tools. GMT supports over 30 map projections and transformations and requires support data such as GSHHG coastlines, rivers, and political boundaries and optionally DCW country polygons. GMT is developed and maintained by Paul Wessel, Walter H. F. Smith, Remko Scharroo, Joaquim Luis and Florian Wobbe, with help from a global set of volunteers, and is supported by the National Science Foundation. It is released under the GNU Lesser General Public License version 3 or any later version.
Graphviz and Instaviz
Configurations: PPC (older versions), 10.5+ (current)
Current Version: 2.20.3
Software updated prior to 7-7-13
Listing updated 9-2-2019
Graphviz is the AT&T open source drawing package. The Mac OS X version and the overall project have their own web sites. The OS X version uses the Aqua user interface. Prepare for a steep learning curve but it may be worth it if you have graphs you do frequently; not what I'd suggest for the occasional one-off though. Note that graphviz does not seem to have had any development for around six years, but Instaviz, an IOS version, is available on the Apple store; it has shape recognition so finger sketches can become graphs for flowchart.
The Graphviz (Mac version) description on their web site was last updated in April 2008. InstaViz, on the other hand, is selling for $8 on the App Store, and was last updated with version 3.8 in 2016.
gnuplot
Configurations: PPC (older versions), Intel (current)
Current Version: 4.6.3
Listing updated: 7-7-13
Program updated: 4-18-13
gnuplot is open source scientific plotting software. It is available online from many sources
OpenEpi
Current Version: 2.2.1
Last update: 4-6-2013
Listing updated: 4-24-2018
Kevin Sullivan’s open source OpenEpi software is available in four languages; unlike most software, it can be run from a web server or on a regular computer. The programs are written in Javascript and html and should be compatible with Macs and Linux and Windows machines. Test results are provided for each module to allow people to check reliability of their own setup. The software is set up for epidemiology and has numerous key statistics for that field, along with the usual means, medians, t-tests, ANOVAs, powers, etc.
StatCrunch
StatCrunch is a freely available for web-based use, currently without advertisements, with a $5 per user fee for use on your own server, or $5/six months. It has the usual range of basic statistics, from t-tests to regression to ANOVA and nonparametric tests, with a wide range of graphs also available, and works from Excel or text files. StatCrunch will also store your data within reason. For those with low budgets or infrequent needs, StatCrunch's fairly easy to use interface and price are extremely attractive (it also makes sharing data easy).
Libraries
Matplotlib
Free - open source - for Mac OS X
Current version: 1.2.1
Report updated: 7/2013
Matplotlib is a pure python plotting library with the goal of making publication quality plots using a syntax familiar to matlab users. The library uses Numeric for handling large data sets and supports a variety of output backend.
On August 28 2012, John D. Hunter, the creator of matplotlib, died from complications arising from cancer treatment, after a brief but intense battle with this terrible illness. Please consider making a donation to the John Hunter Memorial Fund.
SciPy
SciPy is a library of scientific tools for Python which supplements the Numeric module. SciPy includes modules for graphics and plotting, optimization, integration, special functions, signal and image processing, genetic algorithms, ODE solvers, and others.
VTK (Visualization Toolkit)
May be compiled from source code for OS X, Linux, etc
Latest version: 7.1
Listing updated 1/2017
The Visualization ToolKit (VTK) is a system for 3D computer graphics, image processing, and visualization with several interface layers. In VTK applications can be written directly in C++, Tcl, Java, or Python.
“VTK supports a wide variety of visualization algorithms including scalar, vector, tensor, texture, and volumetric methods; and advanced modeling techniques like implicit modelling, polygon reduction, mesh smoothing, cutting, contouring, and Delaunay triangulation. Moreover, we have directly integrated dozens of imaging algorithms into the system so you can mix 2D imaging / 3D graphics algorithms and data.”
Also see:
- Graphing and visualization software (including packages that do statistical routines, e.g. Aabel)
About our test survey file
Our test survey file: The “survey file” has 1,000 cases, with 40 questions on a five-point scale, two irrelevant variables (screen width and height), and a couple of demographics (shown here as “job type” and “new or old hire”). We are planning to run the same tests on each package as time goes on.
Books by MacStats maintainer David Zatz • MacStats created in 1996 by Dr. Joel West; edited since 2005 by Dr. David Zatz of Toolpack Consulting. Copyright © 2005-2020 Zatz LLC. All rights reserved. Contact us.
Most of us walk around carrying a small, sensor-infused computer. We call these devices “smartphones,” and they have more computing power and memory than the Apollo Space Capsules did when they went to the moon. Our phones contain sensors that detect movements, determine magnetic north, and even pinpoint us in relation to rotating satellites.
Our smartphones are incredible mini-trackers that can be used for both good and bad. On the good side, they can be used to help us know more about our health and behaviors. On the bad side, a lot of talk centers on privacy concerns, especially in relation to social media and internet usage but also go back to revelations about government surveillance and our smart phone data too. People seem worried about privacy and personal data, even though few know what data they actually have. Anydvd hd license key.
We should promote greater data protection and privacy, but we shouldn’t ignore the incredible opportunities we can gain from personal data too. So, while the bulk of the discussion these days is about personal data is on the negative’s, like data leaks and data privacy, I believe it’s a good time to try to understand the actual data we do have and how personal data and self-tracking might be used for self-improvement and even self-transformation.
For example, one of the most robust repositories about human health is on our smartphones, wearables and activity trackers. Leveraging a few sensors, our phones and wearables are able to interpret our movement patterns and tell us how many steps we took, how many stairs we climbed, how often we stood up, and many other activities. If you use a wearable with a Heart Rate Sensor, you can also capture your resting, active and sleeping heart rate and even know how long you slept too.
There are various ways and reasons why people track their lives, but when it comes to recording their daily movements, the most common method is with a wearable, activity tracker or smart watch. According to a Statista infographic, the most used wearables today are Fitbit, Apple Watch, Garmin, Mi-Band from XiaoMi, and Fossil. Interestingly, there are dozens of other devices with a much smaller marketshare but which offer an additional array of sensors to track other data points like blood pressure and HRV.
I recently created an open source project called Quantified Self Ledger. These are a collection of Python scripts that help to collect, process and aggregate data from various services like Fitbit, Apple Health, RescueTime and more. The initial goal is to collect and aggregate various self-tracking data. The end goal is to build a personal data dashboard and hopefully one day leverage it for more sophisticated data science and machine learning. In this post, I want to look at Apple Health. For example, how to export, parse and do some data analysis on your Apple Health data using Python. In later posts, we will look at a few other data points and tracking services.
If you are an Apple user, then your iPhone has been tracking your steps and a host of other health metrics. Some are directly recorded by the phone. Others are logged via other health apps that store their data into the Apple Health repository. If you also regularly wear an Apple Watch during the day, during workouts and at night, then you have even more data, like Heart Rate, VO2 Max, and possibility even Sleep.
In this post, we will be exploring Apple Health Data. First, we will look at some methods for exporting your Apple Health data, either using Apple’s raw export or an aggregated version using QS Access app. Second, we will then use some code to parse and process our raw Apple Health logs into more usable formats. Third, we will do some data exploration and data processing, so we can understand patterns and trends. Finally, we use this data to create some data visualizations in Python.
Whether you are merely curious or are trying to use tracking to support lifestyle changes and better habits, hopefully by the end of this post, you’ll understand what data you are collect and hope to start engaging with that data.
Exporing and Preparing Your Apple Health Data
As an Apple user all of your health data gets logged into a locally stored system called Apple Health. Originally setup as a local data storage of health data, Apple Health app has evolved to provide some decent visualizations of your recent activities and appears to be positioned to become a key part of storing and managing your electronic medical record (EMR) too.
While the Apple Health app provides a decent look into your health and movements, you aren’t really going to be able to do any data analysis there. Instead, you’ll either need to use a third party app or external program to explore your data. Fortunately, it’s pretty easy to export your data in its raw state via your iPhone or in a more aggregate form using a third-party app like QS Access.
Data analysis and data visualization can take on a lot of forms. I’m a big fan of data exploration and data visualization with a spreadsheet application like Excel or Google Sheets. To go deeper, Tableau also offers a good way to visualization your data. Ultimately though if you want to do a complete data analysis, data visualization and potentially data modeling, you’ll need to use either R or Python.
Let’s get started by getting some of our Apple Health Data.
QS App: Simplest Way to Export Apple Health Data
The first and easiest option for exporting your Apple Health data is to use the QS Access App, a free iOS app developed by Gary Wolf, Kevin Kelly, and the team at Quantified Self. It’s purpose is simple: export your data from Apple Health into a useable format, like CSV, so you can explore it.
After installing the app, select the specific data points you want to export. (NOTE: You may receive a popup about access permissions. Accept these and allow access.) Once you’ve selected the data points you want, hit “Create Table.” the process may take a several minutes, depending on the amount of data you have in Apple Health. This delay might be longer if you have a lot of data from several years and are exporting your steps and heart rate.
The end result of QS Access export is a well-structured CSV file, which you can open and explore in any spreadsheet application. This is also also good format to use in Tableau, R, Python, and even just a spreadsheet application like Excel or Google Sheets.
The one thing to notice about this export format is that it will add blank records for non-data. This means if you export your Blood Pressure data, you’ll end up with potentially thousands of extra and blank rows. In the case of steps, this is a good thing, since you’ll then have noted hours where you did zero walking. In other data points, this export format makes less sense and results in a lot of unnecessary and confusing data.
The only thing missing from the QS Access export is your workout data. Fortunately, as I explained in detail in How to Track Your Workouts, you can use Workout Export app to get your workout data into a CSV too.
The end result is a row for each workout with key workout metrics:
You can even Export Your Apple Health Workouts to Your Calendar!
How To Download Data Analysis For Mac Numbers Using
If you are not particularly technical and are just looking to get a clean and simple export of your Apple Health data, QS Access and Workout Export apps are your best bet.
How to Get the Raw Export of Your Apple Health Data
Privacy appears to be more important with Apple devices. For example, one of the interesting design choices for Apple Health is that your data is all stored locally on device. Unlike Google Fit, your data is not automatically synced to the cloud or available on an Apple website. Unlike Google Fit or Fitbit, you cannot view your Apple health data on a website or pull the data from an API.
In fact, Apple doesn’t have access to this data unless you provide it directly to them in an export. This means that if you lose or break your phone, then you’ll also lose your health data. So, if your data is important, then you should invest in regular backup of your full data to iCloud or at least regularly export your health data export.
Admittedly having your health data in the cloud would make integration and access much, much easier. For example, Google and Fitbit, sync your steps and other data point to the cloud. This makes it possible to access your data via their API, as I’ve shown in my Fitbit integration in QS Ledger.
Now that we have looked at a few third-party options like QS Access to export a processed version of your Apple Health data, let’s look at the raw export you can get directly from Apple.
In order to get the raw export, go into the “Apple Health” app, tap on your user icon and then select “Export Health Data.”
This export process may take a few minutes, and, once completed, you should then have a filed called “export.zip”. You can share the file with yourself via AirDrop, Email or any other method.
Let’s look at the raw data provided by Apple.
XML Format of Your Apple Health Data
Once you unzip the raw export from Apple Health, you’ll notice a few things.
First, you’ll discover two files: export.xml and export_cda.xml. Unless you are a programmer or technical person, it’s likely you may not be familiar with XML. Extensible Markup Language (XML) is a special markup language that allows you to create well-formatted documents for storing different kinds of information. This structuring and format makes it both human-readable and machine-readable. XML is the format used on most RSS and Podcast Feeds too.
While you can find a number of posts complaining about how unusable or how unfriendly this file format is, the reality is that XML is an extremely robust choice, and, as we will show shortly, this format is relatively easy to be read by computers, and it can be converted into other formats (like CSV) or imported into a data frame, which is a structured data format used in most data science work.
Second, in looking at these files, you might be surprised how big they are. Your iPhone is collecting and tracking a lot of information. For example, in my case, the zipped export was 37 megabytes, and unzipped the files were well over 900 megabyes. The amount of data here creates some challenges in processing and using it, but it also means there is a lot of data to be used too.
As a side note, this Apple Health data is actually not the raw sensor data. Your watch or phone is interpreting sensor data and then aggregating it into the stored results. So, if you want to go one step further into the raw data that your device’s sensors are recording, checkout apps like SensorLog. SensorLog will record the actual numbers from the various sensors, like accelerometer, gps, pedometer, altitude, gyroscope, motion, audio and more. After only a few minutes of recording, you’ll end up with thousands of results and a log file over several megabytes.
Technical Note
Online you’ll find a number of code samples and methods for converting your raw Apple Health XML data to a more usable format.
For our purposes, I’ll be using Python 3 Code, which you can find here: https://github.com/markwk/qs_ledger/tree/master/apple_health. This code is based on In Defence of XML: Exporting and Analysing Apple Health Data and the Python 2 code found at: https://github.com/tdda/applehealthdata.
If you are new to Python and not sure how to get started, I suggest downloading and installed the Anaconda distribution. This is the most friendly setup for getting started with using Python for data science, and it comes prepackaged with the most useful extensions.
If you are a programmer in another language, then you should be able to find some open source code to get started in that language on Github.com.
Converting Apple Health XML to CSV with Python
As I noted in a previous section, the raw Apple Health export is in XML format. So, our first task is to convert it into something more useable, like CSV. Alternatively we could process it directly into a data frame or alternative storage model, which we will look at in a later section.
First, download or clone the code from github.com/markwk/qs_ledger.
Second, locate your apple health export and place it inside of qs_ledger’s sub-directory, “apple_health.” The end result should look something like this:
Third, start Anaconda and launch Jupyter Notebooks with Python 3. Alternatively, from the command line, you can launch it with the following command:
This will trigger a running process for Jupyter and open Jupyter notebook in a browser.
Fourth, in the browser, navigate to the local directory for qs_ledger. In my case, the address is: http://localhost:8888/tree/Development/Python/qs_ledger
It should look something like this:
Fourth, then go into the apple_health directory and open apple_health_extractor.ipynb
:
Fifth, check that the top lines to confirm the location of the apple export:
Sixth, select the cell and run SHIFT + ENTER:
This process may take several minutes to run depending on the size of your Apple Health directory.
After the completing the process, you should have several new files. Each file should include a CSV export of that health metric. This data should be both well-structured and quite verbose, meaning it contains everything originally stored in Apple. You can easily use any of these files to start your data analysis in your favorite data analysis tool like Tableau or even explore it in a Spreadsheet application.
If you want to run some additional checks, run the remaining cells in apple_health_extractor.ipynb to check and count the data. Here is a sample of my weight data:
Two important things to note about this data. First, the timestamps are in UTC time and haven’t been adjusted to a local timezone. Second, if you both wear an apple watch and carry your phone, then you’ll have some overlapping or duplicate data. We will correct these issues and a few other aspects in the next sections.
ALTERNATIVE: Converting Apple Health XML to Feather Data Model with Python
Another method for extracting and parsing your Apple Health data is to convert it to Feather. Feather is a relatively recent data storage method, and it is method I recommend if you are a more serious data scientist. Feather allows data portability with both R and Python and it has some performance optimizations too. You can find good starter code here: github.com/mganjoo/apple-health-exporter
Processing and Exploring Apple Health Data with Python
Now that we’ve extracted our Apple Health data into a more usable format, it’s time to explore and process the data. As noted previously there are a couple issues with the data, namely the timestamps haven’t been localized and we have duplicate data from both the watch and phone. Additionally none of the data has been aggregated into anything useful like hourly or daily stats. Let’s correct these issues one by one.
Adjusting to Local Timezone
In qs_ledger/apple_health, open up the file, apple_heath_data_processor.ipynb. This file will walk us through a step by step method of fixing the timezones, assigning more time-based references, and enable us to aggregate it into more useful stats.
In order to fix these timezone errors, we will use the python timezone package and leverage a couple of simple functions to convert from UTC to our own timezone. We then set a few relevant date references:
NOTE: To adjust to your timezone, replace “Asia/Shanghai” with your timezone selection.
Let’s start by walking through the steps data. Here is what the raw data looks like:
We can then parse out the date and time elements as Shanghai time.
Here is the result:
As you can see, we’ve adjusted and assigned various time references like year, month, date, hour, and day of the week. These will allow us to aggregate and calculate key statistics about steps.
Let’s start with a simple example using steps:
First, we group steps by the date and sum their value. We then create a new data frame called “steps_by_date.” We can then see each date has a steps total and export it to CSV.
There is one problem though. This includes both step data from both the watch and phone. While other data points are unique, steps is a special example since it is collected on both the phone and watch. Let’s clean this up in the next section.
Adjusting for Overlapping Data from Apple Watch and iPhone
Before getting too excited about how many steps you are doing, it’s important to adjust this for duplicate steps we are collecting on both the phone and watch.
Run the following commands, which will separate step counts by device and year:
Here are my results:
As you can see, I have markedly different results on the phone and watch, and the combined results increases my step count by nearly 70%!
In order to get more accurate results, we need to make some adjustments. The simplest and easiest option is to simply use your watch steps. In my case, since for nearly all of my walking I am wearing my smart watch, this is the best option too.
First, get the device names:
steps.sourceName.unique()
Second, configure the following command with your watch name and run:
steps = steps[steps.sourceName MyAppleWatch]
This change filters by only the watch data and now gives us a more accurate total step count.
There is probably a more sophisticated way to get these numbers without completely dropping the phone data. But as a starting point this is a good solution.
Simple Data Explorations and Visualizations with Python
Using our processed data, we can create various visualizations.
Rolling Mean Step Count
Let’s start by looking at the rolling mean of our step count. While the daily step count shows considerable variance on high and low days, a better approach is looking at the 10-day average or rolling mean. Here is the code we can use to calculate this:
steps_by_date['RollingMeanSteps'] = steps_by_date.Steps.rolling(window=10, center=True).mean()
It’s then simple to visualize this with Matplot:
steps_by_date.plot(x='date', y='RollingMeanSteps', title= 'Daily step counts rolling mean over 10 days', figsize=[10, 6])
Here is the result:
Why is there a spike in step count in the last few months of 2017? I was training for a marathon, which included two half marathons.
Step Counts by Days of the Week
First, let’s create a day of the week column:
steps_by_date['date'] = pd.to_datetime(steps_by_date['date'])
steps_by_date['dow'] = steps_by_date['date'].dt.weekday
We can then use some simple code to group the data by day of the week, get the day of the week mean:
data = steps_by_date.groupby(['dow'])['Steps'].mean()
And then chart it:
Here’s the visualization:
Not surprisingly, I get more exercise and running on the weekend. Thus, I see a higher daily average on Saturday and Sunday. While 10k steps is a pseudo-science number, I’m still happy to see my weekday step count is above 10,000 steps.
Additional Step Visualizations
Here are two other visualizations that attempt to show some of my walking trends: Number of Steps Per Month and Steps by Hour of the Day.
These are just a few examples using steps data from Apple Health. Additionally, it is quite simple to adopt this code for other data points like Sleep, HRV, weight, and others.
Conclusion:
In this post, we looked at how to export, parse and extract health data from your Apple Health device. The easiest starting point are apps like QS Access and Apple Health Workout Exporter, which allow you to access your data in a spreadsheet data. We specifically looked at the raw XML export from Apple Health and how to parse it first into a CSV and, in turn, into timezone-adjusted and aggregated stats. This then allowed us to see various patterns like my rolling average of steps per day, days of the week with the most steps and even which hours of the day I’m typically walking. This code and the approaches can provide a great starting point for exploring any of the other health data points collected by your Apple iPhone or Watch.
While much of the discourse recently has focused on data leaks and data privacy, I find this misses the incredible opportunity for self-understanding and self-improvement we can gain from personal data and self-tracking. It’s somewhat ironic that so many people clamor for data privacy when most don’t even know what data they have. I don’t disagree that data privacy is a critical topic both today and in the year to come, but I encourage a healthy engagement with your data too.
For past few years, I’ve been writing and building data tools for both tracking and exploring personal data. One dimension of my work aims at tracking new data points. I’ve built a web app to log your podcast listening called www.PodcastTracker.com, and I’ve created a photo analytics and auto-tagger app for iOS and Android called PhotoStats. PhotoStats lets you view your photo taking life and auto-tag photos so you know what you take pictures of.
Another dimension in my data work is aggregating, combining and understanding our personal data. We can now collect data on our time, our habits, media watching and many others. This results in more data but often it is silo-ed in that service, leading to personal data fragmentation. If you want to understand and leverage your data, you need to get some kind of data convergence. This is why I’ve created an open source project called Quantified Self Ledger. The primary goals are to help aggregate your data, to build a personal data dashboard, and hopefully one day to build some more advance data analytics with machine learning and artificial intelligence. The project already includes integrations for Fitbit, RescueTime, Kindle Highlights, Last.fm, Todoist and Toggl, and, as we’ve looked at in detail in this post, Apple Health Data.
How To Download Data Analysis For Mac Numbers Free
People track their lives for various motivations and reasons, but the two most significant in the research are “self-healing” and “self-design.” Specifically, for self-healing, people track their fitness and health in order to better deal with a disease or injury and navigate concerns with their existing medical care and doctors. Self-design refers to the idea that people use data and self-tracking as a way to support and create their kind of life.
Whatever your reasons for tracking, I believe it’s important to more than just track; the key is engage with your data. Tracking and online tools should and often do provide data accessibility and exports. This data, like Apple Health data, can be used with spreadsheet or more complicated data science tools, to help you understand your current health status and even better support a new and improved “data-driven you.”
Best of luck and happy tracking!
How To Download Data Analysis For Mac Numbers 7
APPENDX:
SOURCE: https://www.statista.com/chart/13115/worldwide-wearable-device-shipments/