dowhy package

Submodules

dowhy.causal_estimator module

class dowhy.causal_estimator.CausalEstimate(estimate, target_estimand, realized_estimand_expr, **kwargs)[source]

Bases: object

Class for the estimate object that every causal estimator returns

add_effect_strength(strength_dict)[source]
add_params(**kwargs)[source]
add_significance_test_results(test_results)[source]
class dowhy.causal_estimator.CausalEstimator(data, identified_estimand, treatment, outcome, control_value=0, treatment_value=1, test_significance=False, evaluate_effect_strength=False, confidence_intervals=False, target_units=None, effect_modifiers=None, params=None)[source]

Bases: object

Base class for an estimator of causal effect.

Subclasses implement different estimation methods. All estimation methods are in the package “dowhy.causal_estimators”

construct_symbolic_estimator(estimand)[source]
do(x)[source]

Method that implements the do-operator.

Given a value x for the treatment, returns the expected value of the outcome when the treatment is intervened to a value x.

Parameters

x – Value of the treatment

Returns

Value of the outcome when treatment is intervened/set to x.

estimate_effect()[source]

Base estimation method that calls the estimate_effect method of its calling subclass.

Can optionally also test significance and estimate effect strength for any returned estimate.

TODO: Enable methods to return a confidence interval in addition to the point estimate.

Parameters

self – object instance of class Estimator

Returns

point estimate of causal effect

estimate_effect_naive()[source]
evaluate_effect_strength(estimate)[source]
test_significance(estimate, num_simulations=None)[source]

Test statistical significance of obtained estimate.

Uses resampling to create a non-parametric significance test. A general procedure. Individual estimators can override this method.

Parameters
  • self – object instance of class Estimator

  • estimate – obtained estimate

  • num_simulations – (optional) number of simulations to run

Returns

class dowhy.causal_estimator.RealizedEstimand(identified_estimand, estimator_name)[source]

Bases: object

update_assumptions(estimator_assumptions)[source]
update_estimand_expression(estimand_expression)[source]

dowhy.causal_graph module

class dowhy.causal_graph.CausalGraph(treatment_name, outcome_name, graph=None, common_cause_names=None, instrument_names=None, effect_modifier_names=None, observed_node_names=None, missing_nodes_as_confounders=False)[source]

Bases: object

Class for creating and modifying the causal graph.

Accepts a graph string (or a text file) in gml format (preferred) and dot format. Graphviz-like attributes can be set for edges and nodes. E.g. style=”dashed” as an edge attribute ensures that the edge is drawn with a dashed line.

If a graph string is not given, names of treatment, outcome, and confounders, instruments and effect modifiers (if any) can be provided to create the graph.

add_missing_nodes_as_common_causes(observed_node_names)[source]
add_node_attributes(observed_node_names)[source]
add_unobserved_common_cause(observed_node_names)[source]
all_observed(node_names)[source]
build_graph(common_cause_names, instrument_names, effect_modifier_names)[source]

Creates nodes and edges based on variable names and their semantics.

Currently only considers the graphical representation of “direct” effect modifiers. Thus, all effect modifiers are assumed to be “direct” unless otherwise expressed using a graph. Based on the taxonomy of effect modifiers by VanderWheele and Robins: “Four types of effect modification: A classification based on directed acyclic graphs. Epidemiology. 2007.”

do_surgery(node_names, remove_outgoing_edges=False, remove_incoming_edges=False)[source]
filter_unobserved_variables(node_names)[source]
get_ancestors(node_name, new_graph=None)[source]
get_causes(nodes, remove_edges=None)[source]
get_common_causes(nodes1, nodes2)[source]

Assume that nodes1 causes nodes2 (e.g., nodes1 are the treatments and nodes2 are the outcomes)

get_descendants(node_name)[source]
get_effect_modifiers(nodes1, nodes2)[source]
get_instruments(treatment_nodes, outcome_nodes)[source]
get_parents(node_name)[source]
get_unconfounded_observed_subgraph()[source]
view_graph(layout='dot')[source]

dowhy.causal_identifier module

class dowhy.causal_identifier.CausalIdentifier(graph, estimand_type, proceed_when_unidentifiable=False)[source]

Bases: object

Class that implements different identification methods.

Currently supports backdoor and instrumental variable identification methods. The identification is based on the causal graph provided.

Other specific ways of identification, such as the ID* algorithm, minimal adjustment criteria, etc. will be added in the future. If you’d like to contribute, please raise an issue or a pull request on Github.

construct_backdoor_estimand(estimand_type, treatment_name, outcome_name, common_causes)[source]
construct_iv_estimand(estimand_type, treatment_name, outcome_name, instrument_names)[source]
identify_effect()[source]

Main method that returns an identified estimand (if one exists).

Uses both backdoor and instrumental variable methods to check if an identified estimand exists, based on the causal graph.

Parameters

self – instance of the CausalEstimator class (or its subclass)

Returns

target estimand, an instance of the IdentifiedEstimand class

class dowhy.causal_identifier.IdentifiedEstimand(treatment_variable, outcome_variable, estimand_type=None, estimands=None, backdoor_variables=None, instrumental_variables=None)[source]

Bases: object

Class for storing a causal estimand, typically as a result of the identification step.

set_identifier_method(identifier_name)[source]

dowhy.causal_model module

Module containing the main model class for the dowhy package.

class dowhy.causal_model.CausalModel(data, treatment, outcome, graph=None, common_causes=None, instruments=None, effect_modifiers=None, estimand_type='nonparametric-ate', proceed_when_unidentifiable=False, missing_nodes_as_confounders=False, **kwargs)[source]

Bases: object

Main class for storing the causal model state.

do(x, identified_estimand, method_name=None, method_params=None)[source]

Do operator for estimating values of the outcome after intervening on treatment.

Parameters
  • identified_estimand – a probability expression that represents the effect to be estimated. Output of CausalModel.identify_effect method

  • method_name – any of the estimation method to be used. See docs for estimate_effect method for a list of supported estimation methods.

  • method_params – Dictionary containing any method-specific parameters. These are passed directly to the estimating method.

Returns

an instance of the CausalEstimate class, containing the causal effect estimate and other method-dependent information

estimate_effect(identified_estimand, method_name=None, control_value=0, treatment_value=1, test_significance=None, evaluate_effect_strength=False, confidence_intervals=False, target_units='ate', effect_modifiers=None, method_params=None)[source]

Estimate the identified causal effect.

Currently requires an explicit method name to be specified. Method names follow the convention of identification method followed by the specific estimation method: “[backdoor/iv].estimation_method_name”. Following methods are supported.
  • Propensity Score Matching: “backdoor.propensity_score_matching”

  • Propensity Score Stratification: “backdoor.propensity_score_stratification”

  • Propensity Score-based Inverse Weighting: “backdoor.propensity_score_weighting”

  • Linear Regression: “backdoor.linear_regression”

  • Instrumental Variables: “iv.instrumental_variable”

  • Regression Discontinuity: “iv.regression_discontinuity”

In addition, you can directly call any of the EconML estimation methods. The convention is “backdoor.econml.path-to-estimator-class”. For example, for the double machine learning estimator (“DMLCateEstimator” class) that is located inside “dml” module of EconML, you can use the method name, “backdoor.econml.dml.DMLCateEstimator”.

Parameters
  • identified_estimand – a probability expression that represents the effect to be estimated. Output of CausalModel.identify_effect method

  • method_name – name of the estimation method to be used.

  • control_value – Value of the treatment in the control group, for effect estimation. If treatment is multi-variate, this can be a list.

  • treatment_value – Value of the treatment in the treated group, for effect estimation. If treatment is multi-variate, this can be a list.

  • test_significance – Binary flag on whether to additionally do a statistical signficance test for the estimate.

  • evaluate_effect_strength – (Experimental) Binary flag on whether to estimate the relative strength of the treatment’s effect. This measure can be used to compare different treatments for the same outcome (by running this method with different treatments sequentially).

  • confidence_intervals – (Experimental) Binary flag indicating whether confidence intervals should be computed.

  • target_units – (Experimental) The units for which the treatment effect should be estimated. This can be of three types. (1) a string for common specifications of target units (namely, “ate”, “att” and “atc”), (2) a lambda function that can be used as an index for the data (pandas DataFrame), or (3) a new DataFrame that contains values of the effect_modifiers and effect will be estimated only for this new data.

  • effect_modifiers – Names of effect modifier variables can be (optionally) specified here too, since they do not affect identification. If None, the effect_modifiers from the CausalModel are used.

  • method_params – Dictionary containing any method-specific parameters. These are passed directly to the estimating method. See the docs for each estimation method for allowed method-specific params.

Returns

An instance of the CausalEstimate class, containing the causal effect estimate and other method-dependent information

identify_effect(proceed_when_unidentifiable=None)[source]

Identify the causal effect to be estimated, using properties of the causal graph.

Parameters

proceed_when_unidentifiable – Binary flag indicating whether identification should proceed in the presence of (potential) unobserved confounders.

Returns

a probability expression (estimand) for the causal effect if identified, else NULL

refute_estimate(estimand, estimate, method_name=None, **kwargs)[source]

Refute an estimated causal effect.

If method_name is provided, uses the provided method. In the future, we may support automatic selection of suitable refutation tests. Following refutation methods are supported.
  • Adding a randomly-generated confounder: “random_common_cause”

  • Adding a confounder that is associated with both treatment and outcome: “add_unobserved_common_cause”

  • Replacing the treatment with a placebo (random) variable): “placebo_treatment_refuter”

  • Removing a random subset of the data: “data_subset_refuter”

Parameters
  • estimand – target estimand, an instance of the IdentifiedEstimand class (typically, the output of identify_effect)

  • estimate – estimate to be refuted, an instance of the CausalEstimate class (typically, the output of estimate_effect)

  • method_name – name of the refutation method

  • **kwargs

    (optional) additional arguments that are passed directly to the refutation method. Can specify a random seed here to ensure reproducible results (‘random_seed’ parameter). For method-specific parameters, consult the documentation for the specific method. All refutation methods are in the causal_refuters subpackage.

Returns

an instance of the RefuteResult class

summary()[source]

Print a text summary of the model.

Returns

None

view_model(layout='dot')[source]

View the causal DAG.

Parameters

layout – string specifying the layout of the graph.

Returns

a visualization of the graph

dowhy.causal_refuter module

class dowhy.causal_refuter.CausalRefutation(estimated_effect, new_effect, refutation_type)[source]

Bases: object

Class for storing the result of a refutation method.

add_significance_test_results(refutation_result)[source]
class dowhy.causal_refuter.CausalRefuter(data, identified_estimand, estimate, **kwargs)[source]

Bases: object

Base class for different refutation methods.

Subclasses implement specific refutations methods.

DEFAULT_NUM_SIMULATIONS = 100
static get_estimator_object(new_data, identified_estimand, estimate)[source]
perform_bootstrap_test(estimate, simulations)[source]
perform_normal_distribution_test(estimate, simulations)[source]
refute_estimate()[source]
test_significance(estimate, simulations, test_type='auto', significance_level=0.05)[source]

Tests the statistical significance of the estimate obtained to the simulations produced by a refuter

The basis behind using the sample statistics of the refuter when we are in fact testing the estimate, is due to the fact that, we would ideally expect them to follow the same distribition

For refutation tests (e.g., placebo refuters), consider the null distribution as a distribution of effect estimates over multiple simulations with placebo treatment, and compute how likely the true estimate (e.g.,

zero for placebo test) is under the null. If the probability of true effect estimate is lower than the p-value, then estimator method fails the test.

For sensitivity analysis tests (e.g., bootstrap, subset or common cause refuters), the null distribution captures the distribution of effect estimates under the “true” dataset (e.g., with an additional confounder or different sampling), and we compute the probability of the obtained estimate under this distribution. If the probability is lower than the p-value, then the estimator method fails the test

Null Hypothesis: The estimate is a part of the distribution Alternative Hypothesis: The estimate does not fall in the distribution.

‘estimate’: CausalEstimate The estimate obtained from the estimator for the original data. ‘simulations’: np.array An array containing the result of the refuter for the simulations ‘test_type’: string, default ‘auto’ The type of test the user wishes to perform. ‘significance_level’: float, default 0.05 The significance level for the statistical test

significance_dict: Dict A Dict containing the p_value and a boolean that indicates if the result is statistically significant

dowhy.data_transformer module

class dowhy.data_transformer.DimensionalityReducer(data_array, ndims, **kwargs)[source]

Bases: object

reduce(target_dimensions=None)[source]

dowhy.datasets module

Module for generating some sample datasets.

dowhy.datasets.choice(a, size=None, replace=True, p=None)

Generates a random sample from a given 1-D array

New in version 1.7.0.

a1-D array-like or int

If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if a were np.arange(a)

sizeint or tuple of ints, optional

Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

replaceboolean, optional

Whether the sample is with or without replacement

p1-D array-like, optional

The probabilities associated with each entry in a. If not given the sample assumes a uniform distribution over all entries in a.

samplessingle item or ndarray

The generated random samples

ValueError

If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-like of size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population size

randint, shuffle, permutation

Generate a uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3)
array([0, 3, 4]) # random
>>> #This is equivalent to np.random.randint(0,5,3)

Generate a non-uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 3, 0]) # random

Generate a uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False)
array([3,1,0]) # random
>>> #This is equivalent to np.random.permutation(np.arange(5))[:3]

Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0])
array([2, 3, 0]) # random

Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:

>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher']
>>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])
array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'], # random
      dtype='<U11')
dowhy.datasets.construct_col_names(name, num_vars, num_discrete_vars, num_discrete_levels, one_hot_encode)[source]
dowhy.datasets.convert_to_categorical(arr, num_vars, num_discrete_vars, quantiles=[0.25, 0.5, 0.75], one_hot_encode=False)[source]
dowhy.datasets.create_dot_graph(treatments, outcome, common_causes, instruments, effect_modifiers=[])[source]
dowhy.datasets.create_gml_graph(treatments, outcome, common_causes, instruments, effect_modifiers=[])[source]
dowhy.datasets.linear_dataset(beta, num_common_causes, num_samples, num_instruments=0, num_effect_modifiers=0, num_treatments=1, treatment_is_binary=True, outcome_is_binary=False, num_discrete_common_causes=0, num_discrete_instruments=0, num_discrete_effect_modifiers=0, one_hot_encode=False)[source]
dowhy.datasets.sigmoid(x)[source]
dowhy.datasets.simple_iv_dataset(beta, num_samples, num_treatments=1, treatment_is_binary=True, outcome_is_binary=False)[source]

Simple instrumental variable dataset with a single IV and a single confounder.

dowhy.datasets.stochastically_convert_to_binary(x)[source]
dowhy.datasets.xy_dataset(num_samples, effect=True, sd_error=1)[source]

dowhy.do_sampler module

class dowhy.do_sampler.DoSampler(data, params=None, variable_types=None, num_cores=1, causal_model=None, keep_original_treatment=False)[source]

Bases: object

Base class for a sampler from the interventional distribution.

disrupt_causes()[source]

Override this method to render treatment assignment conditionally ignorable :return:

do_sample(x)[source]
make_treatment_effective(x)[source]

This is more likely the implementation you’d like to use, but some methods may require overriding this method to make the treatment effective. :param x: :return:

point_sample()[source]
reset()[source]

If your DoSampler has more attributes that the _df attribute, you should reset them all to their initialization values by overriding this method. :return:

sample()[source]

By default, this expects a sampler to be built on class initialization which contains a sample method. Override this method if you want to use a different approach to sampling. :return:

dowhy.plotter module

dowhy.plotter.plot_causal_effect(estimate, treatment, outcome)[source]
dowhy.plotter.plot_treatment_outcome(treatment, outcome, time_var)[source]

Module contents