相关性还是因果效应？¶

假设您获得了有关治疗和结果的一些数据。 Can you determine whether the treatment causes the outcome, or the correlation is purely due to another common cause?

[1]:

import os, sys
sys.path.append(os.path.abspath("../../../"))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
import dowhy
from dowhy import CausalModel
import dowhy.datasets, dowhy.plotter

数据集¶

让我们创建一个神秘数据集，我们需要为其确定是否存在因果效应。它是从以下两个模型之一生成的：

Model 1: Treatment does cause outcome.
Model 2: Treatment does not cause outcome. 观察到的相关性都是由于共同原因造成的。

也就是说两个模型的 treatment and outcome 都具备相关性，但是其相关性的来源不相同。

[90]:

rvar = 1 if np.random.uniform() >0.5 else 0 # 是否具备因果效应, 它要么是0要么是1
data_dict = dowhy.datasets.xy_dataset(10000, effect=rvar, sd_error=0.2)
df = data_dict['df']
print(df[["Treatment", "Outcome", "w0"]].head())
rvar
# data_dict.keys()
# data_dict['gml_graph'], data_dict['ate'], data_dict['common_causes_names'], data_dict['time_val']

   Treatment    Outcome        w0
0   5.813169  11.124906 -0.261496
1   9.915709  19.694752  3.890721
2   9.874701  19.852529  3.829520
3  10.082831  20.137682  3.983658
4   7.479847  15.629233  1.899290

[90]:

$\displaystyle 1$

[91]:

dowhy.plotter.plot_treatment_outcome(df[data_dict["treatment_name"]], df[data_dict["outcome_name"]],
                             df[data_dict["time_val"]])

WARNING:matplotlib.legend:No handles with labels found to put in legend.

../_images/example_notebooks_dowhy_confounder_example_4_1.png

Does Treatment cause Outcome?¶

Using DoWhy to resolve the mystery: Does Treatment cause Outcome? 对于该观测数据，我们如下的方法可以准确估计模拟数据中的因果效应。

STEP 1: Model the problem as a causal graph¶

初始化因果模型。

[92]:

model= CausalModel(
        data=df,
        treatment=data_dict["treatment_name"],
        outcome=data_dict["outcome_name"],
        common_causes=data_dict["common_causes_names"],
        instruments=data_dict["instrument_names"],
        proceed_when_unidentifiable=True)
model.view_model(layout="dot")

WARNING:dowhy.causal_model:Causal Graph not provided. DoWhy will construct a graph based on data inputs.
INFO:dowhy.causal_graph:If this is observed data (not from a randomized experiment), there might always be missing confounders. Adding a node named "Unobserved Confounders" to reflect this.
INFO:dowhy.causal_model:Model to find the causal effect of treatment ['Treatment'] on outcome ['Outcome']

显示存储在本地文件 “causal_model.png” 中的因果图

[93]:

from IPython.display import Image, display
display(Image(filename="causal_model.png"))

../_images/example_notebooks_dowhy_confounder_example_9_0.png

STEP 2: Identify causal effect using properties of the formal causal graph¶

使用因果图来识别因果效应。

[94]:

identified_estimand = model.identify_effect()
print(identified_estimand)

INFO:dowhy.causal_identifier:Common causes of treatment and outcome:['w0', 'U']
WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.
INFO:dowhy.causal_identifier:Continuing by ignoring these unobserved confounders because proceed_when_unidentifiable flag is True.
INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:[]

Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
     d
────────────(Expectation(Outcome|w0))
d[Treatment]
Estimand assumption 1, Unconfoundedness: If U→{Treatment} and U→Outcome then P(Outcome|Treatment,w0,U) = P(Outcome|Treatment,w0)
### Estimand : 2
Estimand name: iv
No such variable found!

STEP 3: Estimate the causal effect¶

Once we have identified the estimand, 我们可以使用任何统计方法来估计因果效应。为了简单起见，让我们使用线性回归。

[95]:

estimate = model.estimate_effect(identified_estimand,
        method_name="backdoor.linear_regression")
print("Causal Estimate is " + str(estimate.value))

# Plot Slope of line between treamtent and outcome =causal effect
dowhy.plotter.plot_causal_effect(estimate, df[data_dict["treatment_name"]], df[data_dict["outcome_name"]])

INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: Outcome~Treatment+w0

Causal Estimate is 0.9939024153048353

../_images/example_notebooks_dowhy_confounder_example_13_2.png

检查估计是否正确

[96]:

print("DoWhy estimate is " + str(estimate.value))
print ("Actual true causal effect was {0}".format(rvar))

DoWhy estimate is 0.9939024153048353
Actual true causal effect was 1

Step 4: Refuting the estimate¶

我们还可以反驳该估计值，以检查其是否符合假设（aka 敏感性分析）。

添加一个随机的共因变量

[97]:

res_random=model.refute_estimate(identified_estimand, estimate, method_name="random_common_cause")
print(res_random)

INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: Outcome~Treatment+w0+w_random

Refute: Add a Random Common Cause
Estimated effect:(0.9939024153048353,)
New effect:(0.9940274254487518,)

Replacing treatment with a random (placebo) variable

[98]:

res_placebo=model.refute_estimate(identified_estimand, estimate,
        method_name="placebo_treatment_refuter", placebo_type="permute")
print(res_placebo)

INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: Outcome~placebo+w0

Refute: Use a Placebo Treatment
Estimated effect:(0.9939024153048353,)
New effect:(-0.00023549427292834935,)

删除数据的随机子集

[99]:

res_subset=model.refute_estimate(identified_estimand, estimate,
        method_name="data_subset_refuter", subset_fraction=0.9)
print(res_subset)

INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: Outcome~Treatment+w0

Refute: Use a subset of data
Estimated effect:(0.9939024153048353,)
New effect:(0.9953760162895797,)

As you can see, our causal estimator is robust to simple refutations.