OtherPapers.com - Other Term Papers and Free Essays
Search

Data Preprocessing

Essay by   •  February 3, 2018  •  Research Paper  •  314 Words (2 Pages)  •  801 Views

Essay Preview: Data Preprocessing

Report this essay
Page 1 of 2

neural network

1) Do neural network on entire data.

We run several models from single hidden layer and experiment with the number of nodes using the default TanH activation function to two layers and 15 nodes. We start with three nodes and slowly increasing the number of nodes in the model. We examine the resulting confusion matrices, and find that twelve nodes give a good model. The input and result show as below.[pic 1]

[pic 2]

The misclassification rate is 9.5%. It’s RSquare up to 49.4% and RMSE is 25.83%, which means this model fits correlatively well.

[pic 3]

2) Do neural network on oversampled data.[pic 4][pic 5]

Lift curve

[pic 6]

Roc curve

[pic 7]

3) Do neural network on oversampled data after duplication.

After several trials, we find out when we only set TanH in first layer to eleven. We can get the best model from them. Confusion matrices shows as below. [pic 8][pic 9]

The misclassification rate is 18.54% and the RMSE is 36.40%, which increased compared to using entire data. However, this model’s RSqure is 56.60%. This model is fitting better.

lift curve[pic 10]

ROC curve[pic 11]

[pic 12]

Discriminant analysis

1) Do discriminant analysis on entire data.

Because this method should separate continuous variables and continuous variables and the response variable is category variable. We should change all predictors into continuous variables.

So, we make indicator columns to category predictors and use these indicators to do the discriminant analysis.

[pic 13]

Do discriminant analysis.

[pic 14]

2) Do discriminant analysis on oversampled data.

After revision

[pic 15]

Do discriminant analysis with dummy variables and continuous predictors

[pic 16]

ROC curve

[pic 17]

3) Do discriminant analysis on oversampled data after duplication.

Revise variables first.[pic 18]

do discriminant analysis[pic 19]

As the screenshot shows, when we predict with this model, there will be 20.14% probability of misclassification and 36.03% RSquare.

So, we can try to use this model to predict, but it’s not recommended.

ROC curve

[pic 20]

...

...

Download as:   txt (2 Kb)   pdf (2.5 Mb)   docx (1.1 Mb)  
Continue for 1 more page »
Only available on OtherPapers.com