Note

Click here to download the full example code

ExternalResources¶

This is a user guide to interacting with the ExternalResources class. The ExternalResources type is experimental and is subject to change in future releases. If you use this type, please provide feedback to the HDMF team so that we can improve the structure and access of data stored with this type for your use cases.

Introduction¶

The ExternalResources class provides a way to organize and map user terms (keys) to multiple resources and entities from the resources. A typical use case for external resources is to link data stored in datasets or attributes to ontologies. For example, you may have a dataset country storing locations. Using ExternalResources allows us to link the country names stored in the dataset to an ontology of all countries, enabling more rigid standardization of the data and facilitating data query and introspection.

From a user’s perspective, one can think of the ExternalResources as a simple table, in which each row associates a particular key stored in a particular object (i.e., Attribute or Dataset in a file) with a particular entity (e.g., a term) of an online resource (e.g., an ontology). That is, (object, key) refer to parts inside a file and (resource, entity) refer to an external resource outside the file, and ExternalResources allows us to link the two. To reduce data redundancy and improve data integrity, ExternalResources stores this data internally in a collection of interlinked tables.

KeyTable where each row describes a Key
ResourceTable where each row describes a Resource
EntityTable where each row describes an Entity
ObjectTable where each row descibes an Object
ObjectKeyTable where each row describes an ObjectKey pair identifying which keys are used by which objects.

The ExternalResources class then provides convenience functions to simplify interaction with these tables, allowing users to treat :py:class:`~hdmf.common.resources.ExternalResources`as a single large table as much as possible.

Rules to ExternalResources¶

When using the ExternalResources class, there are rules to how users store information in the interlinked tables.

Multiple Key objects can have the same name. They are disambiguated by the Object associated with each. I.e., we may have keys with the same name in different objects, but for a particular object all keys must be unique. This means the KeyTable may contain duplicate entries, but the ObjectKeyTable then must not assign duplicate keys to the same object.
In order to query specific records, the ExternalResources class uses ‘(object_id, relative_path, field, Key)’ as the unique identifier.
Object can have multiple Key objects.
Multiple Object objects can use the same Key. Note that the Key may already be associated with resources and entities.
Do not use the private methods to add into the KeyTable, ResourceTable, EntityTable, ObjectTable, ObjectKeyTable individually.
URIs are optional, but highly recommended. If not known, an empty string may be used.
An entity ID should be the unique string identifying the entity in the given resource. This may or may not include a string representing the resource and a colon. Use the format provided by the resource. For example, Identifiers.org uses the ID ncbigene:22353 but the NCBI Gene uses the ID 22353 for the same term.
In a majority of cases, Object objects will have an empty string for ‘field’. The ExternalResources class supports compound data_types. In that case, ‘field’ would be the field of the compound data_type that has an external reference.
In some cases, the attribute that needs an external reference is not a object with a ‘data_type’. The user must then use the nearest object that has a data type to be used as the parent object. When adding an external resource for an object with a data type, users should not provide an attribute. When adding an external resource for an attribute of an object, users need to provide the name of the attribute.

Creating an instance of the ExternalResources class¶

from hdmf.common import ExternalResources
from hdmf.common import DynamicTable
from hdmf import Data
import numpy as np
# Ignore experimental feature warnings in the tutorial to improve rendering
import warnings
warnings.filterwarnings("ignore", category=UserWarning, message="ExternalResources is experimental*")

er = ExternalResources(name='example')

Using the add_ref method¶

add_ref is a wrapper function provided by the ExternalResources class that simplifies adding data. Using add_ref allows us to treat new entries similar to adding a new row to a flat table, with add_ref taking care of populating the underlying data structures accordingly.

data = Data(name="species", data=['Homo sapiens', 'Mus musculus'])
er.add_ref(
    container=data,
    key='Homo sapiens',
    resource_name='NCBI_Taxonomy',
    resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy',
    entity_id='NCBI:txid9606',
    entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606'
)

key, resource, entity = er.add_ref(
    container=data,
    key='Mus musculus',
    resource_name='NCBI_Taxonomy',
    resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy',
    entity_id='NCBI:txid10090',
    entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090'
)

# Print result from the last add_ref call
print(key)
print(resource)
print(entity)

Row(1, keys) = {'key': 'Mus musculus'}
Row(0, resources) = {'resource': 'NCBI_Taxonomy', 'resource_uri': 'https://www.ncbi.nlm.nih.gov/taxonomy'}
Row(1, entities) = {'keys_idx': <hdmf.common.resources.Key object at 0x7f0dab26ad90>, 'resources_idx': <hdmf.common.resources.Resource object at 0x7f0dab26ad00>, 'entity_id': 'NCBI:txid10090', 'entity_uri': 'https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090'}

Using the add_ref method with get_resource¶

When adding references to resources, you may want to refer to multiple entities within the same resource. Resource names are unique, so if you call add_ref with the name of an existing resource, then that resource will be reused. You can also use the get_resource method to get the Resource object and pass that in to add_ref to reuse an existing resource.

# Let's create a new instance of ExternalResources.
er = ExternalResources(name='example')

data = Data(name="species", data=['Homo sapiens', 'Mus musculus'])

er.add_ref(
    container=data,
    key='Homo sapiens',
    resource_name='NCBI_Taxonomy',
    resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy',
    entity_id='NCBI:txid9606',
    entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606'
)

# Using get_resource
existing_resource = er.get_resource('NCBI_Taxonomy')
er.add_ref(
    container=data,
    key='Mus musculus',
    resources_idx=existing_resource,
    entity_id='NCBI:txid10090',
    entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090'
)

(<hdmf.common.resources.Key object at 0x7f0dc1076f70>, <hdmf.common.resources.Resource object at 0x7f0dc1076ca0>, <hdmf.common.resources.Entity object at 0x7f0dc1076c70>)

Using the add_ref method with get_resource¶

When adding references to resources, you may want to refer to multiple entities within the same resource. Resource names are unique, so if you call add_ref with the name of an existing resource, then that resource will be reused. You can also use the get_resource method to get the Resource object and pass that in to add_ref to reuse an existing resource.

# Let's create a new instance of ExternalResources.
er = ExternalResources(name='example')

data = Data(name="species", data=['Homo sapiens', 'Mus musculus'])
er.add_ref(
    container=data,
    field='',
    key='Homo sapiens',
    resource_name='NCBI_Taxonomy',
    resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy',
    entity_id='NCBI:txid9606',
    entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606')

# Using get_resource
existing_resource = er.get_resource('NCBI_Taxonomy')
er.add_ref(
    container=data,
    field='',
    key='Mus musculus',
    resources_idx=existing_resource,
    entity_id='NCBI:txid10090',
    entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090')

(<hdmf.common.resources.Key object at 0x7f0da80fef70>, <hdmf.common.resources.Resource object at 0x7f0dc1076040>, <hdmf.common.resources.Entity object at 0x7f0da80fef10>)

Using the add_ref method with a field¶

It is important to keep in mind that when adding and Object to the :py:class:~hdmf.common.resources.ObjectTable, the parent object identified by object_id must be the closest parent to the target object (i.e., relative_path must be the shortest possible path and as such cannot contain any objects with a data_type and associated object_id).

A common example would be with the DynamicTable class, which holds VectorData objects as columns. If we wanted to add an external reference on a column from a DynamicTable, then we would use the column as the object and not the DynamicTable (Refer to rule 9).

Note: add_ref internally resolves the object to the closest parent, so that er.add_ref(container=genotypes, attribute='genotype_name') and er.add_ref(container=genotypes.genotype_name, attribute=None) will ultimately both use the object_id of the genotypes.genotype_name VectorData column and not the object_id of the genotypes table.

genotypes = DynamicTable(name='genotypes', description='My genotypes')
genotypes.add_column(name='genotype_name', description="Name of genotypes")
genotypes.add_row(id=0, genotype_name='Rorb')
er.add_ref(
    container=genotypes,
    attribute='genotype_name',
    key='Rorb',
    resource_name='MGI Database',
    resource_uri='http://www.informatics.jax.org/',
    entity_id='MGI:1346434',
    entity_uri='http://www.informatics.jax.org/marker/MGI:1343464'
)

(<hdmf.common.resources.Key object at 0x7f0daaf364f0>, <hdmf.common.resources.Resource object at 0x7f0dc105ef10>, <hdmf.common.resources.Entity object at 0x7f0dc105e280>)

Using the get_keys method¶

The get_keys method returns a DataFrame of key_name, resource_table_idx, entity_id, and entity_uri. You can either pass a single key object, a list of key objects, or leave the input parameters empty to return all.

# All Keys
er.get_keys()

# Single Key
er.get_keys(keys=er.get_key('Homo sapiens'))

# List of Specific Keys
er.get_keys(keys=[er.get_key('Homo sapiens'), er.get_key('Mus musculus')])

	key_name	resources_idx	entity_id	entity_uri
0	Homo sapiens	0	NCBI:txid9606	https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
1	Mus musculus	0	NCBI:txid10090	https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...

Using the get_key method¶

The get_key method will return a Key object. In the current version of ExternalResources, duplicate keys are allowed; however, each key needs a unique linking Object. In other words, each combination of (container, relative_path, field, key) can exist only once in ExternalResources.

# The get_key method will return the key object of the unique (key, container, relative_path, field).
key_object = er.get_key(key_name='Rorb', container=genotypes.columns[0])

Using the add_ref method with a key_object¶

Multiple Object objects can use the same Key. To use an existing key when adding new entries into ExternalResources, pass the Key object instead of the ‘key_name’ to the add_ref method. If a ‘key_name’ is used, a new Key will be created.

er.add_ref(
    container=genotypes,
    attribute='genotype_name',
    key=key_object,
    resource_name='Ensembl',
    resource_uri='https://uswest.ensembl.org/index.html',
    entity_id='ENSG00000198963',
    entity_uri='https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000198963'
)

# Let's use get_keys to visualize all the keys that have been added up to now
er.get_keys()

	key_name	resources_idx	entity_id	entity_uri
0	Homo sapiens	0	NCBI:txid9606	https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
1	Mus musculus	0	NCBI:txid10090	https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
2	Rorb	1	MGI:1346434	http://www.informatics.jax.org/marker/MGI:1343464
3	Rorb	2	ENSG00000198963	https://uswest.ensembl.org/Homo_sapiens/Gene/S...

Using get_object_resources¶

This method will return information regarding keys, resources, and entities for an Object. You can pass either the AbstractContainer object or its object ID for the container argument, and the corresponding relative_path and field.

er.get_object_resources(container=genotypes.columns[0])

	keys_idx	resource_idx	entity_id	entity_uri
0	2	1	MGI:1346434	http://www.informatics.jax.org/marker/MGI:1343464

Special Case: Using add_ref with compound data¶

In most cases, the field is left as an empty string, but if the dataset or attribute is a compound data_type, then we can use the ‘field’ value to differentiate the different columns of the dataset. For example, if a dataset has a compound data_type with columns/fields ‘x’, ‘y’, and ‘z’, and each column/field is associated with different ontologies, then use field=’x’ to denote that ‘x’ is using the external reference.

# Let's create a new instance of ExternalResources.
er = ExternalResources(name='example')

data = Data(
    name='data_name',
    data=np.array(
        [('Mus musculus', 9, 81.0), ('Homo sapiens', 3, 27.0)],
        dtype=[('species', 'U14'), ('age', 'i4'), ('weight', 'f4')]
    )
)

er.add_ref(
    container=data,
    field='species',
    key='Mus musculus',
    resource_name='NCBI_Taxonomy',
    resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy',
    entity_id='NCBI:txid10090',
    entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090'
)

(<hdmf.common.resources.Key object at 0x7f0dc107e940>, <hdmf.common.resources.Resource object at 0x7f0dc107e550>, <hdmf.common.resources.Entity object at 0x7f0dc107e5b0>)

Note that because the container is a Data object, and the external resource is being associated with the values of the dataset rather than an attribute of the dataset, the field must be prefixed with ‘data’. Normally, to associate an external resource with the values of the dataset, the field can be left blank. This allows us to differentiate between a dataset compound data type field named ‘x’ and a dataset attribute named ‘x’.

er.add_ref(
    container=data,
    field='species',
    key='Homo sapiens',
    resource_name='NCBI_Taxonomy',
    resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy',
    entity_id='NCBI:txid9606',
    entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606'
)

(<hdmf.common.resources.Key object at 0x7f0dc1082970>, <hdmf.common.resources.Resource object at 0x7f0dc131f790>, <hdmf.common.resources.Entity object at 0x7f0da1faaa30>)

Convert ExternalResources to a single DataFrame¶

er = ExternalResources(name='example')

data1 = Data(
    name='data_name',
    data=np.array(
        [('Mus musculus', 9, 81.0), ('Homo sapiens', 3, 27.0)],
        dtype=[('species', 'U14'), ('age', 'i4'), ('weight', 'f4')]
    )
)

k1, r1, e1 = er.add_ref(
    container=data1,
    field='species',
    key='Mus musculus',
    resource_name='NCBI_Taxonomy',
    resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy',
    entity_id='NCBI:txid10090',
    entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090'
)


k2, r2, e2 = er.add_ref(
    container=data1,
    field='species',
    key='Homo sapiens',
    resource_name='NCBI_Taxonomy',
    resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy',
    entity_id='NCBI:txid9606',
    entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606'
)

# Want to use the same key, resources, and entities for both. But we'll add an extra key just for this one
data2 = Data(name="species", data=['Homo sapiens', 'Mus musculus', 'Pongo abelii'])

o2 = er._add_object(data2, relative_path='', field='')
er._add_object_key(o2, k1)
er._add_object_key(o2, k2)

k2, r2, e2 = er.add_ref(
    container=data2,
    field='',
    key='Pongo abelii',
    resource_name='NCBI_Taxonomy',
    resource_uri='https://www.ncbi.nlm.nih.gov/taxonomy',
    entity_id='NCBI:txid9601',
    entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9601'
)

# Question:
# - Can add_ref be used to associate two different objects with the same keys, resources, and entities?
#    - Here we use the private _add_object, and _add_object_key methods to do this but should this not be possible
#      with add_ref? Specifically, add_ref allows Resource, Key, objects to be reused on input but not Entity? Why?
#      E.g., should we be able to do:
#      er.add_ref(
#         container=data2,
#         field='',
#         key=k1,
#         resources_idx=r1,
#         entity_id=e1      # <-- not allowed
#      )
#

genotypes = DynamicTable(name='genotypes', description='My genotypes')
genotypes.add_column(name='genotype_name', description="Name of genotypes")
genotypes.add_row(id=0, genotype_name='Rorb')
k3, r3, e3 = er.add_ref(
    container=genotypes['genotype_name'],
    field='',
    key='Rorb',
    resource_name='MGI Database',
    resource_uri='http://www.informatics.jax.org/',
    entity_id='MGI:1346434',
    entity_uri='http://www.informatics.jax.org/marker/MGI:1343464'
)
er.add_ref(
    container=genotypes['genotype_name'],
    field='',
    key=k3,
    resource_name='Ensembl',
    resource_uri='https://uswest.ensembl.org/index.html',
    entity_id='ENSG00000198963',
    entity_uri='https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000198963'
)

(<hdmf.common.resources.Key object at 0x7f0da1f539d0>, <hdmf.common.resources.Resource object at 0x7f0da1f53df0>, <hdmf.common.resources.Entity object at 0x7f0da1f53e50>)

Convert the individual tables to DataFrames¶

er.keys.to_dataframe()

	key
0	Mus musculus
1	Homo sapiens
2	Pongo abelii
3	Rorb

er.resources.to_dataframe()

	resource	resource_uri
0	NCBI_Taxonomy	https://www.ncbi.nlm.nih.gov/taxonomy
1	MGI Database	http://www.informatics.jax.org/
2	Ensembl	https://uswest.ensembl.org/index.html

Note that key 3 has 2 entities assigned to it in the entities table

er.entities.to_dataframe()

	keys_idx	resources_idx	entity_id	entity_uri
0	0	0	NCBI:txid10090	https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
1	1	0	NCBI:txid9606	https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
2	2	0	NCBI:txid9601	https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
3	3	1	MGI:1346434	http://www.informatics.jax.org/marker/MGI:1343464
4	3	2	ENSG00000198963	https://uswest.ensembl.org/Homo_sapiens/Gene/S...

er.objects.to_dataframe()

	object_id	field
0	1246247a-3816-4fa5-95ce-62816294edf6	species
1	05a47117-271a-4147-adb8-e6d3c476d2c0
2	4fba041b-0585-4410-bcca-371f4b79fca4

Note that key 0 and 1 are used by both object 0 and object 1 in the object_keys table

er.object_keys.to_dataframe()

	objects_idx	keys_idx
0	0	0
1	0	1
2	1	0
3	1	1
4	1	2
5	2	3

Convert the whole ExternalResources to a single DataFrame¶

Using the to_dataframe method of the ExternalResources we can convert the data from the corresponding Keys, Resources, Entities, Objects, and ObjectKeys tables to a single joint DataFrame. In this conversion the data is being denormalized, such that e.g., the Keys that are used across multiple Enitites are duplicated across the corresponding rows. Here this is the case, e.g., for the keys "Homo sapiens" and "Mus musculus" which are used in the first two objects (rows with index=[0, 1, 2, 3]), or the Rorb key which appears in both the MGI Database and Ensembl resource (rows with index=[5,6]).

er.to_dataframe()

	objects_idx	object_id	field	keys_idx	key	resources_idx	resource	resource_uri	entities_idx	entity_id	entity_uri
0	0	1246247a-3816-4fa5-95ce-62816294edf6	species	0	Mus musculus	0	NCBI_Taxonomy	https://www.ncbi.nlm.nih.gov/taxonomy	0	NCBI:txid10090	https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
1	0	1246247a-3816-4fa5-95ce-62816294edf6	species	1	Homo sapiens	0	NCBI_Taxonomy	https://www.ncbi.nlm.nih.gov/taxonomy	1	NCBI:txid9606	https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
2	1	05a47117-271a-4147-adb8-e6d3c476d2c0		0	Mus musculus	0	NCBI_Taxonomy	https://www.ncbi.nlm.nih.gov/taxonomy	0	NCBI:txid10090	https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
3	1	05a47117-271a-4147-adb8-e6d3c476d2c0		1	Homo sapiens	0	NCBI_Taxonomy	https://www.ncbi.nlm.nih.gov/taxonomy	1	NCBI:txid9606	https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
4	1	05a47117-271a-4147-adb8-e6d3c476d2c0		2	Pongo abelii	0	NCBI_Taxonomy	https://www.ncbi.nlm.nih.gov/taxonomy	2	NCBI:txid9601	https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
5	2	4fba041b-0585-4410-bcca-371f4b79fca4		3	Rorb	1	MGI Database	http://www.informatics.jax.org/	3	MGI:1346434	http://www.informatics.jax.org/marker/MGI:1343464
6	2	4fba041b-0585-4410-bcca-371f4b79fca4		3	Rorb	2	Ensembl	https://uswest.ensembl.org/index.html	4	ENSG00000198963	https://uswest.ensembl.org/Homo_sapiens/Gene/S...

By setting use_categories=True the function will use a pandas.MultiIndex on the columns instead to indicate for each column also the category (i.e., objects, keys, entities, and resources the columns belong to. Note: The category in the combined table is not the same as the name of the source table but rather represents the semantic category, e.g., keys_idx appears as a foreign key in both the ObjectKeys and Entities tables but in terms of the combined table is a logical property of the keys.

er.to_dataframe(use_categories=True)

	objects			keys		resources			entities
	objects_idx	object_id	field	keys_idx	key	resources_idx	resource	resource_uri	entities_idx	entity_id	entity_uri
0	0	1246247a-3816-4fa5-95ce-62816294edf6	species	0	Mus musculus	0	NCBI_Taxonomy	https://www.ncbi.nlm.nih.gov/taxonomy	0	NCBI:txid10090	https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
1	0	1246247a-3816-4fa5-95ce-62816294edf6	species	1	Homo sapiens	0	NCBI_Taxonomy	https://www.ncbi.nlm.nih.gov/taxonomy	1	NCBI:txid9606	https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
2	1	05a47117-271a-4147-adb8-e6d3c476d2c0		0	Mus musculus	0	NCBI_Taxonomy	https://www.ncbi.nlm.nih.gov/taxonomy	0	NCBI:txid10090	https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
3	1	05a47117-271a-4147-adb8-e6d3c476d2c0		1	Homo sapiens	0	NCBI_Taxonomy	https://www.ncbi.nlm.nih.gov/taxonomy	1	NCBI:txid9606	https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
4	1	05a47117-271a-4147-adb8-e6d3c476d2c0		2	Pongo abelii	0	NCBI_Taxonomy	https://www.ncbi.nlm.nih.gov/taxonomy	2	NCBI:txid9601	https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
5	2	4fba041b-0585-4410-bcca-371f4b79fca4		3	Rorb	1	MGI Database	http://www.informatics.jax.org/	3	MGI:1346434	http://www.informatics.jax.org/marker/MGI:1343464
6	2	4fba041b-0585-4410-bcca-371f4b79fca4		3	Rorb	2	Ensembl	https://uswest.ensembl.org/index.html	4	ENSG00000198963	https://uswest.ensembl.org/Homo_sapiens/Gene/S...

Export ExternalResources to SQLite¶

# Set the database file to use and clean up the file if it exists
import os
db_file = "test_externalresources.sqlite"
if os.path.exists(db_file):
    os.remove(db_file)

Export the data stored in the ExternalResources object to a SQLite database.

er.export_to_sqlite(db_file)

Test that the generated SQLite database is correct

import sqlite3
import pandas as pd
from contextlib import closing

with closing(sqlite3.connect(db_file)) as db:
    cursor = db.cursor()
    # read all tables
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
    tables = cursor.fetchall()
    # convert all tables to pandas and compare with the original tables
    for table_name in tables:
        table_name = table_name[0]
        table = pd.read_sql_query("SELECT * from %s" % table_name, db)
        table = table.set_index('id')
        ref_table = getattr(er, table_name).to_dataframe()
        assert np.all(np.array(table.index) == np.array(ref_table.index) + 1)
        for c in table.columns:
            # NOTE: SQLite uses 1-based row-indices so we need adjust for that
            if np.issubdtype(table[c].dtype, np.integer):
                assert np.all(np.array(table[c]) == np.array(ref_table[c]) + 1)
            else:
                assert np.all(np.array(table[c]) == np.array(ref_table[c]))
    cursor.close()

Remove the test file

os.remove(db_file)

Gallery generated by Sphinx-Gallery