client

bigquery.client.get_client(project_id=None, credentials=None, service_url=None, service_account=None, private_key=None, private_key_file=None, json_key=None, json_key_file=None, readonly=True, swallow_results=True, num_retries=0)

Return a singleton instance of BigQueryClient. Either AssertionCredentials or a service account and private key combination need to be provided in order to authenticate requests to BigQuery.

Parameters:

project_id : str, optional

The BigQuery project id, required unless json_key or json_key_file is provided.

credentials : oauth2client.client.SignedJwtAssertionCredentials, optional

AssertionCredentials instance to authenticate requests to BigQuery (optional, must provide service_account and (private_key or private_key_file) or (json_key or json_key_file) if not included

service_url : str, optional

A URI string template pointing to the location of Google’s API discovery service. Requires two parameters {api} and {apiVersion} that when filled in produce an absolute URI to the discovery document for that service. If not set then the default googleapiclient discovery URI is used. See credentials

service_account : str, optional

The Google API service account name. See credentials

private_key : str, optional

The private key associated with the service account in PKCS12 or PEM format. See credentials

private_key_file : str, optional

The name of the file containing the private key associated with the service account in PKCS12 or PEM format. See credentials

json_key : dict, optional

The JSON key associated with the service account. See credentials

json_key_file : str, optional

The name of the JSON key file associated with the service account. See credentials.

readonly : bool

Bool indicating if BigQuery access is read-only. Has no effect if credentials are provided. Default True.

swallow_results : bool

If set to False, then return the actual response value instead of converting to boolean. Default True.

num_retries : int, optional

The number of times to retry the request. Default 0 (no retry).

Returns:

BigQueryClient

An instance of the BigQuery client.

bigquery.client.get_projects(bq_service)

Given the BigQuery service, return data about all projects.

BigQueryClient Class

class bigquery.client.BigQueryClient(bq_service, project_id, swallow_results=True, num_retries=0)

Methods

check_dataset(dataset_id[, project_id]) Check to see if a dataset exists.
check_job(job_id) Return the state and number of results of a query by job id.
check_table(dataset, table[, project_id]) Check to see if a table exists.
create_dataset(dataset_id[, friendly_name, …]) Create a new BigQuery dataset.
create_table(dataset, table, schema[, …]) Create a new table in the dataset.
create_view(dataset, view, query[, …]) Create a new view in the dataset.
dataset_resource(ref_id[, friendly_name, …]) See https://developers.
delete_dataset(dataset_id[, …]) Delete a BigQuery dataset.
delete_table(dataset, table[, project_id]) Delete a table from the dataset.
export_data_to_uris(destination_uris, …[, …]) Export data from a BigQuery table to cloud storage.
get_all_tables(dataset_id[, project_id]) Retrieve a list of tables for the dataset.
get_dataset(dataset_id[, project_id]) Retrieve a dataset if it exists, otherwise return an empty dict.
get_datasets([project_id]) List all datasets in the project.
get_query_results(job_id[, offset, limit, …]) Execute the query job indicated by the given job id.
get_query_rows(job_id[, offset, limit, timeout]) Retrieve a list of rows from a query table by job id.
get_query_schema(job_id) Retrieve the schema of a query by job id.
get_table(dataset, table[, project_id]) Retrieve a table if it exists, otherwise return an empty dict.
get_table_schema(dataset, table[, project_id]) Return the table schema.
get_tables(dataset_id, app_id, start_time, …) Retrieve a list of tables that are related to the given app id and are inside the range of start and end times.
import_data_from_uris(source_uris, dataset, …) Imports data into a BigQuery table from cloud storage.
patch_dataset(dataset_id[, friendly_name, …]) Updates information in an existing dataset.
patch_table(dataset, table, schema[, project_id]) Patch an existing table in the dataset.
push_rows(dataset, table, rows[, …]) Upload rows to BigQuery table.
query(query[, max_results, timeout, …]) Submit a query to BigQuery.
schema_from_record(record) Given a dict representing a record instance to be inserted into BigQuery, calculate the schema.
update_dataset(dataset_id[, friendly_name, …]) Updates information in an existing dataset.
update_table(dataset, table, schema[, …]) Update an existing table in the dataset.
wait_for_job(job[, interval, timeout]) Waits until the job indicated by job_resource is done or has failed
write_to_table(query[, dataset, table, …]) Write query result to table.
check_dataset(dataset_id, project_id=None)

Check to see if a dataset exists.

Parameters:

dataset_id : str

Dataset unique id

project_id: str, optional

The project the dataset is in

Returns:

bool

True if dataset at dataset_id exists, else Fasle

check_job(job_id)

Return the state and number of results of a query by job id.

Parameters:

job_id : str

The job id of the query to check.

Returns:

tuple

(bool, int) Whether or not the query has completed and the total number of rows included in the query table if it has completed (else 0)

check_table(dataset, table, project_id=None)

Check to see if a table exists.

Parameters:

dataset : str

The dataset to check

table : str

The name of the table

project_id: str, optional

The project the table is in

Returns:

bool

True if table exists, else False

create_dataset(dataset_id, friendly_name=None, description=None, access=None, location=None, project_id=None)

Create a new BigQuery dataset.

Parameters:

dataset_id : str

Unique str identifying the dataset with the project (the referenceID of the dataset, not the integer id of the dataset)

friendly_name: str, optional

A human readable name

description: str, optional

Longer string providing a description

access : list, optional

location : str, optional

Indicating where dataset should be stored: EU or US (see https://developers.google.com/bigquery/docs/reference/v2/datasets#resource)

project_id: str

Unique str identifying the BigQuery project contains the dataset

Returns:

Union[bool, dict]

bool indicating if dataset was created or not, or response from BigQuery if swallow_results is set for False

create_table(dataset, table, schema, expiration_time=None, time_partitioning=False, project_id=None)

Create a new table in the dataset.

Parameters:

dataset : str

The dataset to create the table in

table : str

The name of the table to create

schema : dict

The table schema

expiration_time : int or double, optional

The expiry time in milliseconds since the epoch.

time_partitioning : bool, optional

Create a time partitioning.

project_id: str, optional

The project to create the table in

Returns:

Union[bool, dict]

If the table was successfully created, or response from BigQuery if swallow_results is set to False

create_view(dataset, view, query, use_legacy_sql=None, project_id=None)

Create a new view in the dataset.

Parameters:

dataset : str

The dataset to create the view in

view : str

The name of the view to create

query : dict

A query that BigQuery executes when the view is referenced.

use_legacy_sql : bool, optional

If False, the query will use BigQuery’s standard SQL (https://cloud.google.com/bigquery/sql-reference/)

project_id: str, optional

The project to create the view in

Returns:

Union[bool, dict]

bool indicating if the view was successfully created or not, or response from BigQuery if swallow_results is set to False.

dataset_resource(ref_id, friendly_name=None, description=None, access=None, location=None, project_id=None)

See https://developers.google.com/bigquery/docs/reference/v2/datasets#resource

Parameters:

ref_id : str

Dataset id (the reference id, not the integer id)

friendly_name : str, optional

An optional descriptive name for the dataset

description : str, optional

An optional description for the dataset

access : list, optional

Indicating access permissions

location: str, optional, ‘EU’ or ‘US’

An optional geographical location for the dataset(EU or US)

project_id: str

Unique str identifying the BigQuery project contains the dataset

Returns:

dict

Representing BigQuery dataset resource

delete_dataset(dataset_id, delete_contents=False, project_id=None)

Delete a BigQuery dataset.

Parameters:

dataset_id : str

Unique str identifying the dataset with the project (the referenceId of the dataset) Unique str identifying the BigQuery project contains the dataset

delete_contents : bool, optional

If True, forces the deletion of the dataset even when the dataset contains data (Default = False)

project_id: str, optional

Returns:

Union[bool, dict[

ool indicating if the delete was successful or not, or response from BigQuery if swallow_results is set for False

Raises:

HttpError

404 when dataset with dataset_id does not exist

delete_table(dataset, table, project_id=None)

Delete a table from the dataset.

Parameters:

dataset : str

The dataset to delete the table from.

table : str

The name of the table to delete

project_id: str, optional

String id of the project

Returns:

Union[bool, dict]

bool indicating if the table was successfully deleted or not, or response from BigQuery if swallow_results is set for False.

export_data_to_uris(destination_uris, dataset, table, job=None, compression=None, destination_format=None, print_header=None, field_delimiter=None, project_id=None)

Export data from a BigQuery table to cloud storage. Optional arguments that are not specified are determined by BigQuery as described: https://developers.google.com/bigquery/docs/reference/v2/jobs

Parameters:

destination_uris : Union[str, list]

str or list of str objects representing the URIs on cloud storage of the form: gs://bucket/filename

dataset : str

String id of the dataset

table : str

String id of the table

job : str, optional

String identifying the job (a unique jobid is automatically generated if not provided)

compression : str, optional

One of the JOB_COMPRESSION_* constants

destination_format : str, optional

One of the JOB_DESTination_FORMAT_* constants

print_header : bool, optional

Whether or not to print the header

field_delimiter : str, optional

Character separating fields in delimited file

project_id: str, optional

String id of the project

Returns:

dict

A BigQuery job resource

Raises:

JobInsertException

On http/auth failures or error in result

get_all_tables(dataset_id, project_id=None)

Retrieve a list of tables for the dataset.

Parameters:

dataset_id : str

The dataset to retrieve table data for.

project_id: str

Unique str identifying the BigQuery project contains the dataset

Returns:

A list with all table names

get_dataset(dataset_id, project_id=None)

Retrieve a dataset if it exists, otherwise return an empty dict.

Parameters:

dataset_id : str

Dataset unique id

project_id: str, optional

The project the dataset is in

Returns:

dict

Contains dataset object if it exists, else empty

get_datasets(project_id=None)

List all datasets in the project.

Parameters:

project_id: str

Unique str identifying the BigQuery project contains the dataset

Returns:

list

Dataset resources

get_query_results(job_id, offset=None, limit=None, page_token=None, timeout=0)

Execute the query job indicated by the given job id. This is direct mapping to bigquery api https://cloud.google.com/bigquery/docs/reference/v2/jobs/getQueryResults

Parameters:

job_id : str

The job id of the query to check

offset : optional

The index the result set should start at.

limit : int, optional

The maximum number of results to retrieve.

page_token : optional

Page token, returned by previous call, to request the next page of results.

timeout : float, optional

Timeout in seconds

Returns:

out

The query reply

get_query_rows(job_id, offset=None, limit=None, timeout=0)

Retrieve a list of rows from a query table by job id. This method will append results from multiple pages together. If you want to manually page through results, you can use get_query_results method directly.

Parameters:

job_id : str

The job id that references a BigQuery query.

offset : int, optional

The offset of the rows to pull from BigQuery

limit : int, optional

The number of rows to retrieve from a query table.

timeout : float, optional

Timeout in seconds.

Returns:

list

A list of dict objects that represent table rows.

get_query_schema(job_id)

Retrieve the schema of a query by job id.

Parameters:

job_id : str

The job_id that references a BigQuery query

Returns:

list

A list of dict objects that represent the schema.

get_table(dataset, table, project_id=None)

Retrieve a table if it exists, otherwise return an empty dict.

Parameters:

dataset : str

The dataset that the table is in

table : str

The name of the table

project_id: str, optional

The project that the table is in

Returns:

dict

Containing the table object if it exists, else empty

get_table_schema(dataset, table, project_id=None)

Return the table schema.

Parameters:

dataset : str

The dataset containing the table.

table : str

The table to get the schema for

project_id: str, optional

The project of the dataset.

Returns:

list

A list of dict objects that represent the table schema. If the table doesn’t exist, None is returned.

get_tables(dataset_id, app_id, start_time, end_time, project_id=None)

Retrieve a list of tables that are related to the given app id and are inside the range of start and end times.

Parameters:

dataset_id : str

The BigQuery dataset id to consider.

app_id : str

The appspot name

start_time : Union[datetime, int]

The datetime or unix time after which records will be fetched.

end_time : Union[datetime, int]

The datetime or unix time up to which records will be fetched.

project_id: str, optional

String id of the project

Returns:

list

A list of table names.

import_data_from_uris(source_uris, dataset, table, schema=None, job=None, source_format=None, create_disposition=None, write_disposition=None, encoding=None, ignore_unknown_values=None, max_bad_records=None, allow_jagged_rows=None, allow_quoted_newlines=None, field_delimiter=None, quote=None, skip_leading_rows=None, project_id=None)

Imports data into a BigQuery table from cloud storage. Optional arguments that are not specified are determined by BigQuery as described:

Parameters:

source_urls : list

A list of str objects representing the urls on cloud storage of the form: gs://bucket/filename

dataset : str

String id of the dataset

table : str

String id of the table

schema : list, optional

Represents the BigQuery schema

job : str, optional

Identifies the job (a unique job id is automatically generated if not provided)

source_format : str, optional

One of the JOB_SOURCE_FORMAT_* constants

create_disposition : str, optional

One of the JOB_CREATE_* constants

write_disposition : str, optional

One of the JOB_WRITE_* constants

encoding : str, optional

One of the JOB_ENCODING_* constants

ignore_unknown_values : bool, optional

Whether or not to ignore unknown values

max_bad_records : int, optional

Maximum number of bad records

allow_jagged_rows : bool, optional

For csv only

allow_quoted_newlines : bool, optional

For csv only

field_delimiter : str, optional

For csv only

quote : str, optional

Quote character for csv only

skip_leading_rows : int, optional

For csv only

project_id: str, optional

String id of the project

Returns:

dict

A BigQuery job response

Raises:

JobInsertException

on http/auth failures or error in result

patch_dataset(dataset_id, friendly_name=None, description=None, access=None, project_id=None)

Updates information in an existing dataset. The update method replaces the entire dataset resource, whereas the patch method only replaces fields that are provided in the submitted dataset resource.

Parameters:

dataset_id : str

Unique string idenfitying the dataset with the project (the referenceId of the dataset)

friendly_name : str, optional

An optional descriptive name for the dataset.

description : str, optional

An optional description of the dataset.

access : list, optional

Indicating access permissions.

project_id: str, optional

Unique str identifying the BigQuery project contains the dataset

Returns:

Union[bool, dict]

bool indicating if the patch was successful or not, or response from BigQuery if swallow_results is set for False.

patch_table(dataset, table, schema, project_id=None)

Patch an existing table in the dataset.

Parameters:

dataset : str

The dataset to patch the table in

table : str

The name of the table to patch

schema : dict

The table schema

project_id: str, optional

The project to patch the table in

Returns:

Union[bool, dict]

Bool indicating if the table was successfully patched or not, or response from BigQuery if swallow_results is set to False

push_rows(dataset, table, rows, insert_id_key=None, skip_invalid_rows=None, ignore_unknown_values=None, template_suffix=None, project_id=None)

Upload rows to BigQuery table.

Parameters:

dataset : str

The dataset to upload to

table : str

The name of the table to insert rows into

rows : list

A list of rows (dict objects) to add to the table

insert_id_key : str, optional

Key for insertId in row. You can use dot separated key for nested column.

skip_invalid_rows : bool, optional

Insert all valid rows of a request, even if invalid rows exist.

ignore_unknown_values : bool, optional

Accept rows that contain values that do not match the schema.

template_suffix : str, optional

Inserts the rows into an {table}{template_suffix}. If table {table}{template_suffix} doesn’t exist, create from {table}.

project_id: str, optional

The project to upload to

Returns:

Union[bool, dict]

bool indicating if insert succeeded or not, or response from BigQuery if swallow_results is set for False.

query(query, max_results=None, timeout=0, dry_run=False, use_legacy_sql=None, external_udf_uris=None)

Submit a query to BigQuery.

Parameters:

query : str

BigQuery query string

max_results : int, optional

The maximum number of rows to return per page of results.

timeout : float, optional

How long to wait for the query to complete, in seconds before the request times out and returns.

dry_run : bool, optional

If True, the query isn’t actually run. A valid query will return an empty response, while an invalid one will return the same error message it would if it wasn’t a dry run.

use_legacy_sql : bool, optional. Default True.

If False, the query will use BigQuery’s standard SQL (https://cloud.google.com/bigquery/sql-reference/)

external_udf_uris : list, optional

Contains external UDF URIs. If given, URIs must be Google Cloud Storage and have .js extensions.

Returns:

tuple

(job id, query results) if the query completed. If dry_run is True, job id will be None and results will be empty if the query is valid or a dict containing the response if invalid.

Raises:

BigQueryTimeoutException

on timeout

classmethod schema_from_record(record)

Given a dict representing a record instance to be inserted into BigQuery, calculate the schema.

Parameters:

record : dict

representing a record to be inserted into big query, where all keys are str objects (representing column names in the record) and all values are of type int, str, unicode, float, bool, datetime, or dict. A dict value represents a record, and must conform to the same restrictions as record.

Returns:

list

BigQuery schema

Notes

Results are undefined if a different value type is provided for a repeated field: E.g.

>>> { rfield: [ { x: 1}, {x: "a string"} ] } # undefined!
update_dataset(dataset_id, friendly_name=None, description=None, access=None, project_id=None)

Updates information in an existing dataset. The update method replaces the entire dataset resource, whereas the patch method only replaces fields that are provided in the submitted dataset resource.

Parameters:

dataset_id : str

Unique str identifying the dataset with the project (the referencedId of the dataset)

friendly_name : str, optional

An optional descriptive name for the dataset.

description : str, optional

An optional description of the dataset.

access : list, optional

Indicating access permissions

project_id: str, optional

Unique str identifying the BigQuery project contains the dataset

Returns:

Union[bool, dict]

bool indicating if the update was successful or not, or response from BigQuery if swallow_results is set for False.

update_table(dataset, table, schema, project_id=None)

Update an existing table in the dataset.

Parameters:

dataset : str

The dataset to update the table in

table : str

The name of the table to update

schema : dict

Table schema

project_id: str, optional

The project to update the table in

Returns:

Union[bool, dict]

bool indicating if the table was successfully updated or not, or response from BigQuery if swallow_results is set to False.

wait_for_job(job, interval=5, timeout=60)

Waits until the job indicated by job_resource is done or has failed

Parameters:

job : Union[dict, str]

dict representing a BigQuery job resource, or a str representing the BigQuery job id

interval : float, optional

Polling interval in seconds, default = 5

timeout : float, optional

Timeout in seconds, default = 60

Returns:

dict

Raises:

Union[JobExecutingException, BigQueryTimeoutException]

On http/auth failures or timeout

write_to_table(query, dataset=None, table=None, external_udf_uris=None, allow_large_results=None, use_query_cache=None, priority=None, create_disposition=None, write_disposition=None, use_legacy_sql=None, maximum_billing_tier=None, flatten=None, project_id=None)

Write query result to table. If dataset or table is not provided, Bigquery will write the result to temporary table. Optional arguments that are not specified are determined by BigQuery as described: https://developers.google.com/bigquery/docs/reference/v2/jobs

Parameters:

query : str

BigQuery query string

dataset : str, optional

String id of the dataset

table : str, optional

String id of the table

external_udf_uris : list, optional

Contains external UDF URIs. If given, URIs must be Google Cloud Storage and have .js extensions.

allow_large_results : bool, optional

Whether or not to allow large results

use_query_cache : bool, optional

Whether or not to use query cache

priority : str, optional

One of the JOB_PRIORITY_* constants

create_disposition : str, optional

One of the JOB_CREATE_* constants

write_disposition : str, optional

One of the JOB_WRITE_* constants

use_legacy_sql: bool, optional

If False, the query will use BigQuery’s standard SQL (https://cloud.google.com/bigquery/sql-reference/)

maximum_billing_tier : integer, optional

Limits the billing tier for this job. Queries that have resource usage beyond this tier will fail (without incurring a charge). If unspecified, this will be set to your project default. For more information, see https://cloud.google.com/bigquery/pricing#high-compute

flatten : bool, optional

Whether or not to flatten nested and repeated fields in query results

project_id: str, optional

String id of the project

Returns:

dict

A BigQuery job resource

Raises:

JobInsertException

On http/auth failures or error in result