Practical-4

Aim: Visual programming with orange tool.

Theory:

Visual programming

Visual programming is a type of programming language that lets humans describe processes using illustration. To create a to-do list with a visual programming tool, the programmer draws out the flow of the app. The resulting flowchart describes screens, user interactions, and what happens to the data at each stage.

Data Sampler:

Inputs

Data: input dataset

Outputs

Data Sample: sampled data instances
Remaining Data: out-of-sample data

Many data sampling methods are implemented by Data Sampler widget. It outputs a sampled and a complementary dataset (with instances from the input set that are not included in the sampled dataset). The output is processed after the input dataset is provided and Sample Data is pressed.

Information on the input and output dataset.
· The desired sampling method:

Fixed proportion of data returns a chosen percentage of the entire data (e.g. 70% of all the data)
Fixed sample size returns a selected number of data instances with a chance to set Sample with replacement, which always samples from the entire dataset .
Cross Validation partitions data instances into the specified number of complementary subsets. Following a typical validation schema, all subsets except the one selected by the user are output as Data Sample, and the selected subset goes to Remaining Data. (Note: In older versions, the outputs were swapped. If the widget is loaded from an older workflow, it switches to compatibility mode.)
Bootstrap infers the sample from the population statistic.

Press Sample Data to output the data sample.

Example

· Fixed Sample Size

First, let’s see how the Data Sampler works. We will use the Insurance dataset from the File widget.
We sampled the data with the Data Sampler widget
We chose to go with a fixed sample size of 5 instances.
We can observe the sampled data in the Data Table widget.
The second Data Table(out of sample) shows the remaining instances that weren’t in the sample. To output the out-of-sample data, double-click the connection between the widgets and rewire the output to Remaining Data –> Data.

Fixed Sample Size(Split into Train and Test)

Now, we will use the Data Sampler to split the data into training and testing part. We are using the SuperStore dataset , which we loaded with the File widget.
In Data Sampler, we split the data with Fixed proportion of data, keeping 70% of data instances in the sample.
Then we connected one output contain 70% data to the Test and Score and the remaining data output to the Test & Score widget, Data Sample –> Data and Remaining Data –> Test and score. And then we add Logistic Regression as a learner, Logistic Regression –> Test and score. This runs logistic regression on the Data input and evaluates the results on the Test Data.

Cross Validation

- The most typical procedure is cross validation, which splits the data into k folds and uses k – 1 folds for training and the remaining fold for testing.

- This procedure is repeated, so that each fold has been used for testing exactly once.

Now, we will use the Data Sampler to split the data into training and testing part. We are using the SuperStore dataset, which we loaded with the File widget.
In Data Sampler, we split the data with cross validation, keeping 10 used subset in the sample.
Then we connected Data sampler –> Test and score. And then we add Logistic Regression as a learner, Logistic Regression–> Test and score.
As our data set is small we need to reduce the number of folds, it depends on the size of the dataset.

Search This Blog

D17IT156 JAYDIP JAYSWAL

Practical-4

Comments

Post a Comment