For instructions on how to use the relational database, click here.
- Data Content & Sources
- Data Freshness, Accuracy
- File Formats
- Data Structure
- Table and Field List
- Sample Data
- System Requirements
- Technical Support
- Trusted Locations in Microsoft Access®
Description of the Data
The CarePrecise data universe is a modular family of datasets, together representing a single relational database spanning a broad healthcare provider information space. The color squares below indicate which module the data belongs to. Refer to the color coding in the Fields List below to view field descriptions for each module.
We provide both Microsoft Access MDB files and standard comma delimited files (CSV) for all provider data tables.
CPAC: CarePrecise Access Complete™ healthcare provider data
CPAC contains substantial information on every one of the 6.4 million HIPAA-covered healthcare providers. The field list below includes descriptions of data available in the basic CPAC dataset (color-coded by a gray background). The CPAC data, like all CarePrecise data products, is a relational database, generally using the NPI number as the key across all tables.
Historical Data Included. The CPAC dataset also includes certain historical data, that is, information on deactivated providers. Additional historical data is also available. See this link.
NEW: We've recently added two new tables:
Review their contents below in the fields list.
EPGH: Extended Professional, Group and Hospital™ data
The EPGH module adds significant details to the CPAC data for a subset of providers; specifically, those providers who are actively seeing patients and who appear on Medicare claims. The field list below includes descriptions of data available in bundle products that include the Extended Physician, Group and Hospital (EPGH) dataset, indicated by the blue background.
The extended dataset offers additional information on the nation's active PECOS-enrolled providers, and includes a deduplicated database of approximately 5,000 hospitals, 180,000 practice groups, and 900,000 individual providers (primarily physicians) -- all linked together to provide a 360-degree view of the active provider universe. Physician records are linked to their practice group records, and to their affiliated hospital records.
The EPGH data makes it possible to pull a simple list of hospitals and their affiliated physicians or practice groups, or to pull a list of practice groups and view a verified list of medical professionals affiliated with the groups.
Included with the EPGH module are group practice details, medical school, graduation year, and more.
The extended hospital data includes the Medicare CCN number used to link to other data resources, such as hospital quality data found in the Authoritative Hospital Database.
The extended professionals and groups data includes the key to physician and practice group quality data available from CMS.
The EPGH dataset is not sold separately as a standalone database, but as part of the Master Bundle. It is also included in the CarePrecise Platinum package.
ProCase™ properly-cased data
The ProCase module offers the same individual name data, organization name data and address data found in the CPAC module, but in properly-cased form. Using a large database of properly-cased name and address listings, plus a proprietary algorithm, a very high quality of intelligent proper-casing is achieved. ProCase data tables are shown below with a pink background.
The ProCase module is available as an add-on to CPAC, Master Bundle, Platinum or Gold packages, and is not available separately. To order the ProCase add-on, please contact CarePrecise Sales or call (877) 782-2294.
The Authoritative Physician Database™
The Authoritative Physician Database (APD) contains detailed data on all U.S. HIPAA-covered physicians. Although the APD is available separately as a standalone file, it may also be linked with data in other CarePrecise datasets using the NPI, PACID, and CCN fields. The data table shown below with a purple background represents data included in the Authoritative Hospital Database.
See product page or purple highlighted section in table below for data definitions.
The Authoritative Hospital Database™
The Authoritative Hospital Database (AHD) contains deeper data on the widest set of U.S. hospitals. Although the AHD is available separately as a standalone file, it may also be linked with data in the CPAC and EPGH datasets using the NPI and CCN fields. The data table shown below with a green background represents data included in the Authoritative Hospital Database.
See product page or green highlighted section in table below for data definitions.
Additional data contained in Gold and Platinum packages
Data tables shown below with a gold background represent additional data included in the CP ListMaker software bundled with CarePrecise Gold and CarePrecise Platinum. This data is made available primarily for operation of the CP ListMaker software, and may also be used on its own. This data includes geocoding (latitude and longitude), wealth of the provider's service area, urban or rural status, and more.
Click to view larger image...
CarePrecise Access is distributed as normalized relational database tables, keyed by the National Provider Identifier (NPI) number, plus a core Microsoft Access database that links the tables together and includes query examples, and CSV versions of the normalized data files. Additional tables are included which contain records that have been newly added, newly dropped, the complete excluded providers database, and tables defining codes used in the database.
To use the CSV or MDB files, which are organized as a relational database linked by the NPI number in most cases, you will need to use a relational database management software, such as Microsoft Access (part of Office) version 2007 or later, FileMaker, or other product. If you are using Microsoft Access, open the file named NPPES_core.mdb if you are using only the CPAC, Select or State product, or Extended_core.mdb if you are using the master bundle product.
CarePrecise data packages contain scaled data from federal and other sources, including the National Plan and Provider Enumeration System (NPPES) and the Provider Enrollment, Chain and Ownership System (PECOS), maintained by the Centers for Medicare and Medicaid Services (CMS), a part of the U.S. Department of Health and Human Services (HHS) and fraud warnings provided from the federal List of Excluded Individuals (LEIE) database using an algorithmic record linking system*. No NPPES data field is left out of the CarePrecise Access dataset, but additional data may be added or conformed to improve comprehension or searchability, depending on the specific CarePrecise product; for instance, 2-digit state codes and UPIN numbers are conformed where appropriate. PECOS status (whether or not a provider is enrolled to be able to bill Medicare) is included. Taxonomy descriptions (plain English translations of specialty and subspecialty codes) are included in all packages. Market demographic data (economic, social) is included in some packages.
All healthcare providers are included. Users can export specific types of provider, e.g., physicians or hospitals, by querying on the specific taxonomy (specialty) code(s).
Co-location codes, which identify providers practicing at the same location, are established by CarePrecise, and are exclusive to CarePrecise products.
CarePrecise Access data is updated every month, usually within a few days of release of the updated source data by federal agencies. And also unlike many in-house systems and other NPI data services, the CarePrecise system generates no exceptions in processing and matching the core provider data. Any exception is corrected through successive processing loops, and our relational design virtually eliminates query failures. The data is authoritative because CarePrecise datasets are strictly faithful to the source data. Original content (extrapolated and algorithmically matched data) which are included in CarePrecise Access, including LEIE and EPLS excluded providers/vendors data, which share no common unique identifier with the NPPES data*, is subjected to rigorous record-linkage processing.
The NPPES (National Provider and Plan Enumeration System, maintained by Centers for Medicare and Medicaid Services, is one of the most accurate resources form provider data. Providers are required by law to update their record within thirty days of a change. No database is perfect; however, the NPPES gets high marks in recent studies.
"NPPES and SK&A had the highest rates of matching mailing address information, while the AMA Masterfile had low rates compared with the NPPES... the NPPES and to a lesser extent, the SK&A file, appear to provide reasonably accurate, up-to-date address information for physicians billing public and [private] insurers."
"The Results Are Only as Good as the Sample: Assessing Three National Physician Sampling Frames," Journal of General Internal Medicine. 2015. Catherine M. DesRoches, Dr.P.H., et al.
The NPPES database scored best at 94% accuracy, the SK&A at 92%, while the AMA Masterfile had significantly lower rates of correct address information across all specialties scoring only 54%. That said, CarePrecise goes beyond the NPPES data, merging in provider name and address information extracted from millions of Medicare claims through a 12-month window, offering additional practice location and group affiliation data that is frequently the most up-to-date available, surpassing our competitors' offerings and even the AMA Masterfile and online provider network directories of some of the nation's largest health insurance plans.
Solid, comprehensive, and authoritative, CarePrecise products are used to create and update network directories, to investigate fraud, waste, and abuse across the healthcare industry, to organize clinical and market research, and in a broad variety of Web applications. Licensing is available for all of these special uses.
Because CarePrecise does not control the data content; products are offered as-is and no warranty as to accuracy, timeliness, or fitness for a particular purpose is expressed or implied.*
Two databases are included in the CarePrecise Access Complete data distribution, the NPPES_Deact and NPPES_Delta, to provide historical information on providers. The tDeact table lists all NPI numbers that have been deactivated since the beginning of the NPI enumeration system, going back to 2005, showing only the NPI and the date of deactivation. The Delta databases include complete data on the deactivated provider records, showing all of the data from the CarePrecise Access Complete dataset, going back to 2011.Another table, NPPES_Added, provides historical dates of when all providers, even those deactivated from the current database, were first added to the system. Each record in the NPPES_Delta tables represents a rich provider record that has been deactivated by CMS and no longer appears in the current month's data distribution, but DOES remain in the exclusive CarePrecise delta database, to which are added each month's deactivated records. See the fields list above for descriptions of the delta tables.
An additional specialized product, the Physicians' Sanctions, Reinstatements, Deactivations and Reactivations (SRDR) Database, is also available for use in determining a physician or other healthcare provider's eligibility to bill insurance at a given point in time going back to 2009. Not included in the standard products listed here, the SRDR includes not only dates of activation and deactivation, but also dates of exclusions (sanctions) and reinstatements, along with reasons for the exclusions. Pricing can be found on the Historical Data page. Contact us for additional information on the SRDR.
Added & Dropped Provider Records
These two data tables are generated by CarePrecise and included in the CarePrecise Access Complete dataset, representing provider records that have been added to or dropped from the NPPES database upon the monthly updates. The tables grow each month; each record is dated so that it is possible to know when a provider record was added or dropped. The dropped records table includes basic name and contact information, as well as the NPI number.
Group CoLoCode™ Data
Two separate but complementary data sources now make it possible to identify group practices. The first, group data provided by CMS, identifies providers who have reported a multi-specialty group or single-specialty group practice. This particular data is extracted every month from Medicare claims, and represents a moving 12-month window on individuals' group affiliations. We provide both a "Group" flag in the provider's data, and relational table with the reported codes (tTaxoGroup).
The second tool we provide to identify groups is the co-location code table (tCoLoCode). The CoLoCode is an algorithmic derivation from provider practice data, which facilitates querying providers by practice location. Our QoRelate™ record linking system standardizes and conforms location data beyond postal specifications to link providers practicing together.
These data can be used together to successfully identify physician groups and all of their specialties, and the CoLoCode data can be used to identify any provider type in a co-located group practice, including dentists, optometrists, radiologists, etc. The CoLoCodes can also be used to identify physicians and others working within large health systems, universities and other institutions.
You can select by size of the practice (number of co-located providers, selecting a range (i.e., between 5 and 50 providers).
Example Application: The co-location codes let you find large (or small) clinics, based on the Count field, which shows the number of providers in the database with the same practice location. For instance, while the code UICFLG70MMPNPNHRR seems meaningless, it happens to be a CoLoCode for the Cleveland Clinic in , and it can instantly filter the database to show you all of the 2700+ providers practicing at the Cleveland Clinic's 9500 Euclid Avenue site.
The most reliable method for identifying group practices' members is provided in the EPGH dataset, which captures individual and group links from Medicare claims.
CarePrecise Access is delivered in two file formats, each containing identical data:
- Microsoft Access (.mdb); normalized relational tables, plus a central core database with query examples
- Comma Separated Values (.csv), supporting FileMaker, MySQL, MSSQL and other database systems, on Windows, Mac or Linux, in normalized relational tables
The CP ListMaker component of CarePrecise Platinum and CarePrecise Gold requires Microsoft Access 2007 or later.
You can use built-in utilities in the MDB application to export the data in most common data formats, including CSV, TXT, MDB, ACCDB and others. Application extensions are available for conversion to MySQL, among other formats. Conversion to MSSQL is done within SQL Server.
Query examples are included in the NPPES_Core.mdb database application included with the CarePrecise package. Queries are performed within this application using the extensive and user-friendly tools provided by Microsoft Access. Refer to the examples, opening them in design view, for a detailed understanding of how the CarePrecise data is queried. The CarePrecise data structure make many NPPES queries possible that simply cannot be done on ordinary desktop and laptop computers with other NPPES products.
Keying To Legacy/In-House Data
CarePrecise data includes many "hooks" that can be used to key legacy provider databases to the NPI. NPI numbers, telephone and fax numbers and UPINs are among the best general unique identifiers for matching to legacy data. Providers have a strong incentive to report all "Other Identifiers" that apply to them, as these are used by insurance payers in claims crosswalks that expedite reimbursement, and even increase reimbursement when preferred credentials and specialties are able to be linked to the claim, through such identifiers as UPIN, Medicare PIN, and others. Hooks for linking legacy data to CarePrecise include 10-digit telephone numbers for more than 99.8% of providers, 9-digit zip codes on nearly every address, and more than 700,000 physician UPINs, as well as complete name data with expanded AKA names, and complete practice and mailing addresses.
Use the NPPES_Core.mdb database application included with the CarePrecise package in Microsoft Access to create reports using the extensive report creation capabilities of Microsoft Access. NPPES_Core contains sample queries that can be used to pull targeted data.
NOTE: If you need a completely user-friendly, easy-to-use software application for preparing and exporting lists, Excel output, or reports without database skills, see CarePrecise Gold or CarePrecise Platinum.
Healthcare providers are required to enumerate in the NPPES database if they are a HIPAA covered entity (that's a much abbreviated statement; see the official CMS documentation for more detail), and must enroll in the PECOS database to be eligible to bill Medicare. Under the NPI rules there are two types of providers, Type 1 and Type 2. Essentially, Type 1 providers are individual persons providing healthcare, such as physicians, nurses and psychologists among others, while Type 2 providers are essentially organizations, or facilities, such as hospitals, labs and pharmacies among others. We say that Type 1 providers are a "face" and Type 2 providers are a "place."
Type 1 providers may have only one NPI number for themselves. But they may have an additional NPI for their corporation. Type 2 providers may have as many NPI numbers as they wish, and typically have multiple NPIs to distinguish business units, and/or to assist with reimbursement pay flow.
With more than 6.4 million rows of provider data, and because the database is organized in a normalized relational format, a database application is the ideal environment for use. We provide a "core" database shell that links all of the table files together into a single database environment for use in Microsoft Access. For the CPAC dataset only, use the file named NPPES_Core.mdb. For the Master Bundle (CPAC + EPGH datasets), use the file named Extended_Core.mdb. MDB files, rather than ACCDB files, are provided to offer better backward compatibility with Microsoft Access versions, but is subject to be changed to ACCDB format in the future. The CP ListMaker component of CarePrecise Platinum and CarePrecise Gold is a powerful tool for extracting specific types of providers by a wide range of criteria, including geographic location, specialty, gender and other data points.
Microsoft Access Alternatives
Don't have Microsoft Access, but want to use the full CarePrecise database anyway? No problem. You can either import the data into your own system, or get an Access alternative. OpenOffice Base is free, is compatible with our files to allow you to do nearly any kind of lookup, and is available for Windows, Linux and Mac (Linux and Mac versions require you to import our data rather than use the files directly). Ability Pro is another compatible database product offering more robust features, and is priced under $50. If you want an environment that works on Mac, Linux and Windows, get Kexi, another free solution (Kexi has its own file format, so you'll import our data into it, rather than using our files directly).
Because these files contain more than the number of records that Excel can open, you will need to use the Excel Power Pivot add-in if you wish to view all of the more than 6.4 million rows. (Power Pivot is not required to use Excel files having fewer than 1,048,576 records exported from the Gold or Platinum CP ListMaker application.) The Power Pivot add-in has distributed with Excel since the 2013 version, but Excel 2010 can use Power Pivot add-in as well, available as a free download from Microsoft. However, Excel is decidedly not the best method for working with the full database of 6.4 million records; rather, a Database Management System (DBMS), such as Microsoft Access (part of Microsoft Office Pro and Office 365), FileMaker Pro, and other database software.
CarePrecise data is single-user licensed for use by the purchaser and may not be redistributed in any form and may not be exposed in full or in part by any means including a website without special licensing. Review the Single User License Agreement here. Learn about special licensing on the Licensing page, and contact CarePrecise Sales for information.
If you are having difficulty exporting or converting the data to a particular form required by your application, please contact us at (877) 782-2294 for assistance with advanced data services.
For use as CSV data files, database software that can open comma delimited files is required. For operation as a single joined dataset running in Microsoft Access, Microsoft Access 2007 or later is required. You should have a minimum of 4 gigabytes of RAM. Approximately 6GB of disk space should be allotted for storage of the MDB files, plus additional disk space when manipulating the data. The CSV files may be imported into any application that supports comma separated value files and supports more than 8 million rows of data.
Different CarePrecise product packages come with different levels and periods of free technical support. Refer to the support page for details. Paid advanced technical support packages are also available.
Refer to the CMS field descriptions for technical descriptions of the data fields. CarePrecise uses abbreviated field names; see the Data Structure table above for NPPES-to-CarePrecise field name comparisons.
Refer to the CMS field codes document for details on field codes used in CarePrecise.
Consult the License Agreement for information on permissible uses of CarePrecise data.
Product Evaluation Sheet may be downloaded here.
Sample data may be downloaded here.
Read about our record linkage technology for combining data from multiple sources.
Query Running Speed
Because CarePrecise datasets are very large (CarePrecise Access Complete contains more than 6.4 million NPI records normalized into ten tables, comprising about 1 billion data points), some complex queries can take a long time to run. If you are unaccustomed to working with very large data files, this may take some getting used to. But you can do some things to dramatically improve running speed. Here are some tips:
- Close other programs on your computer. Consider not running complex Access queries on the complete CarePrecise dataset while other processor-hungry programs are operating, whether you are actively using them or not. In particular, Microsoft Outlook can cause a serious reduction in processing speed.
- Make sure you have adequate virtual memory configured. Consult your Windows documentation for instructions on setting virtual memory (search Windows Help for "virtual memory"). We use the following settings for virtual memory: Initial 2046 MB; Maximum 4092 MB; other settings at default.
- Disable unnecessary Windows Services. Refer to the documentation for your version of Windows to locate the Services window. In particular, the Windows Index service used by the desktop search tool can dramatically affect processing speed. Same goes for third party indexing tools such as the Google Desktop Search tool.
- Pause or Disable Carbonite Backup. Carbonite and other background backup programs can cause slow-downs when running Microsoft Access processes. Pause or disable these processes while running queries on large datasets.
- Don't allow your antivirus program to run a system scan while you are using the database. Likewise, disable any background tools that hog system resources, such as scheduled backups. It's best to schedule such operations to run at times when you will not be using the computer anyway, but, surprisingly, the default settings of some programs will attempt to run any time your keyboard and mouse are idle, even if the processor is busy trying to do database work.
- Make sure you have adequate contiguous disk space available. As time passes, your computer's hard disk becomes fragmented -- especially when you use very large databases like those from CarePrecise. Unless there are large areas of free space on your hard drive for CP ListMaker to use, it can take much longer for processes to run, or processes may fail to complete. We recommend periodic defragmentation of the disk which hosts your CarePrecise data if you are frequently using Microsoft Access "Make Table" or "Append" operations. We also recommend defragmenting after a monthly update of the CarePrecise Access Complete dataset, due to the large disk areas that are overwritten.
- Don't use CP ListMaker over a network. Use of CarePrecise data products over a network will not only slow down the program, but can cause network issues for other users. CP ListMaker is licensed only for use on a single computer (standard license).
- Periodically "Compact and Repair." As you use a Microsoft Access database it can become bloated with unnecessary page fragments. Refer to the help file for your version of Access to learn how to Compact and Repair your databases. The NPPES_Core.mdb automatically Compacts/Repairs itself when you close it. However, the linked NPPES databases do not, and generally will not need it unless you make changes to the data. To Compact/Repair the linked NPPES databases, open NPPES_Core.mdb, open the System Utilities form, then click the "Compact/Repair All Databases" button. This will run a safe Compact and Repair on your primary CarePrecise data files.
- Configure your hard drives/arrays for optimum speed. Consult your system administrator to see if placing CarePrecise on a different hard drive from Microsoft Access will improve data processing speed. Some disk configurations can "double dip" (let parts of programs access two drives simultaneously), improving data access operations.
- Avoid using the "Like" operator in your queries. The Like operator requires deep parsing and slows down query processing.
- Optimize your queries. Review how you can optimize Access query operation; see the following links:
Understanding Algorithmically Derived Fields
In order to bring additional functionality to a dataset, it is often necessary to create a datapoint that doesn't exist in the source data, and use an algorithm to populate it based on existing data. For instance, in order to find all of the records located at the same physical address – data which may be "fat fingered" or entered in various different ways – an algorithm can be used to "clean up" location data and create a new field. Such is the case with the CarePrecise CoLoCode™ field.
In another instance, CarePrecise uses an algorithm to get an NPI number into the available PECOS data of practice groups and hospitals (in the EPGH dataset) so that it can be used (judiciously) to link to an organization's record in the CPAC dataset. Because a "fuzzy logic" algorithm is used to find the NPI, some errors do occur, so that not every populated organization NPI in the EPGH or AHD datasets will be a correct one. The process involves comparing existing data between the EPGH's fields and the CPAC's field to match the organization with its NPI record (a process known as "record linkage") The algorithm gets smarter over time, but it can never be trusted to provide complete accuracy. If the group or hospital data in the EPGH is insufficient, the NPI number provided by CarePrecise can be used to view the more complete NPI record in the CPAC (or better in many cases, the CoLoCode), with the proviso that relying on automation alone may produce mismatched records.