Questions for Discussion Why did eBay need a Big Data solution? What were the challenges, the proposed solution, and the obtained results?
Category: Data Analysis
Why eBay need a Big Data solution
Questions for Discussion Why did eBay need a Big Data solution? What were the challenges, the proposed solution, and the obtained results?
Challenges of a digital signature
Write a 6page paper (deliverable length does not include the title and reference pages)
What are benefits and challenges of a digital signature?
What are the properties that a digital signature should have?
Will digital signatures be able to keep up with the industry of tomorrow?
create a 3 page ppt for the above.
Challenges of a digital signature
Write a 6page paper (deliverable length does not include the title and reference pages)
What are benefits and challenges of a digital signature?
What are the properties that a digital signature should have?
Will digital signatures be able to keep up with the industry of tomorrow?
create a 3 page ppt for the above.
DATA ANALYSIS
CATEGORICAL DATA ANALYSIS
Create a research question using the General Social Survey dataset that can be answered using categorical analysis.
RESOURCES: Chapters 10 and 11 of the Frankfort-Nachmias & Leon-Guerrero
Use SPSS to answer the research question. Post your response to the following:
(1) Include the General Social Survey Dataset’s mean of Age to verify the dataset you used.
(2) What is your research question?
(3) What is the null hypothesis for your question?
(4) What research design would align with this question?
(5) What dependent variable was used and how is it measured?
(6) What independent variable is used and how is it measured?
(7) If you found significance, what is the strength of the effect?
(8) Explain your results for a lay audience and further explain what the answer is to your research question.
Be sure to support your output and Response Post with reference to the week’s Learning Resources and other scholarly evidence in APA Style
Dataset
Q1: Which factor affect the fraud the most? (Logistic Regression Model)
The box plot above indicates that income, credit limit, age, and name_email_similarity exhibit significant data dispersion. This suggests that these factors could be significant determinants of fraud. While the box plot provides a visual representation of the data dispersion, it is merely an initial observation. To further explore the correlation between these factors and fraud, we will be using Logistic Regression.
2.Split the data into a train and test set.
The percentage of data to be used for training is set to 70%. They are the train and test datasets, respectively.
3.Find the fitted model.
We use glm() function to generate the first model, we use all variables from fraud_bool in m1, data is Analy_train, family is binomial, and link is logistic regression. Its AIC value is 4090.7. However, we don’t believe this is the best model, so we try different combinations again and again finally, we got the best model as m3.
Here is the summary of m3:
In m3, the AIC become 4077.1 and, at the same time, we noticed that some payment type, employment status, housing status, and device has eliminated from the model. We suppose that those are not affecting the fraudulence. However, the m3 model could not directly tell us which affect the fraudulence the most, so we need to look at the p-value of each coefficient. For all p-values here, we can find that device with windows has the smallest p-value. It should that whether the device is windows will affect the fraudulence the most.
Q2. What is the relationship between fraud and income? (Linear Regression Model)
According to the screenshot, we can find that fraud and income do not have a strong relationship in linear regression model, because the R-squared is too small which means that the relationship is too small. As a result, we only could use the coefficient in the logistic model:
Fraud = -9.03 + 1.354 * Income
However, in the Logistic Regression Model, this isn’t the direct relationship. When it is bigger than 0.5, we predict it will have fraudulence. When it is smaller than 0.5, we predict it won’t have fraudulence.
Q3. What is the relationship between fraud and age? (Linear Regression Model)
In the screenshot, it shows the similar situation. It only has R-squared of 0.0006527. It is too small, and we only can give the equation in Logistic Regression:
Fraud = -9.03 + 0.02081 * Age
Q4. For the people who are not fraudulent, is any relationship between the age and the income? (Linear Regression Model)
As shown above, we can see a positive correlation between age and income for those who have not been defrauded. This is because, in general, income tends to increase with age as people gain more experience and advance in their careers. And higher-income people have more knowledge about financial matters, so it leads them to be better able to identify fraud.
The Linear Regression Line is:
Income = 0.48793084 + 0.00288834 * Age
Q5. Is the Velocity 6h, Velocity 24h, and Velocity 4w have the same mean? (ANOVA)
Step 1:
Null Hypothesis: Velocity 6h, Velocity 24h, and Velocity 4w have the same mean.
Mean (Velocity 6h) = Mean (Velocity 24h,) = Mean (Velocity 4w)
Alternative Hypothesis: they are not the same.
Step 2:
Step 3:
According to the data, the p-value is smaller than 0.05, so we can reject the Null Hypothesis and accept the Alternative Hypothesis.
Step 4:
As a result, we can say that the mean of Velocity 6h, the mean of Velocity 24h and the mean of Velocity 4w are not the same.
Data Collection Tool
How to identify and fix the gaps in security management in order to prevent cyber attacks.
Under the heading method and design in your research proposal regardless of the method/approach used qualitative, quantitative, or mix methods, you will need to discuss the data collection instrument (tool) you propose to use to gather data. The word instrument is the general term that researchers use for a measurement device (survey, questionnaire, test, interview, observation, etc.). Discuss the instrument you intend to use in your research proposal and how you intend to address the instrument’s validity and reliability noting the different types of each.
Dimensionality of the data by converting numerical variables
Option 1:
Using the file Longitudinal Survey, subset the data to select only those individuals who lived in an urban area. Reduce the dimensionality of the data by converting numerical variables such as age, height, weight, number of years of education, number of siblings, family size, number of weeks employed, self-esteem scale, and income into a smaller set of principal components that retain at least 90% of the information in the original data. After showing your work, summarize your findings in a paragraph containing no more than five sentences. It should be clear how PCA improved one’s ability to interpret the data. Note: you will need to standardize the data prior to PCA as the scales of the variables are different.
Show your workings and Summarize the findings.
Option 2:
Using the file House Price, choose one of the college towns in the data set. Reduce the dimensionality of the data by converting numerical variables such as number of bedrooms, number of bathrooms, home square footage, lot square footage, and age of the house into a smaller set of the principal components that retain at least 90% of the information in the original data. Then use the principal components as predictor variables for building two de novo models for predicting sale prices of houses. Summarize each of the two models in two or three sentences each.
Show your workings and develop models of home prices.
Understanding or application of case study data analysis.
Consider the readings for this module concerning the analysis of case study data. In your post, address the following:
- What three key ideas were most significant from the readings.
- One element/issue/concept that you found difficult in your understanding or application of case study data analysis.
Running a string of informants in a major street crimes
You and your partner are running a string of informants in a major street crimes unit that includes narcotics, weapons, and organized crime. One informant is a beautiful young woman who was caught with some marijuana and is now working with your team, identifying drug dealers. She has been flirting with every male member of the team, and even with a female member on one occasion. You have a second informant who is very friendly and brings coffee and pastries to meetings with the team. You note that he is being treated by the team as a member of the team. Finally, a third informant who is a paid informant is a dynamo. He brings six or seven issues at a time for the team to work on. He gets paid by his production and seizures in the cases he comes up with. You notice he is starting to overwhelm the team with potential targets of criminal activity.
Address the following:
In terms of ethics, what are the 3 informants in the scenario jeopardizing? Explain.
How should the team deal with the 3 informants based on their behavior? Explain and fully justify your argument.
What should you tell the members of the team concerning how you want your informants run on the street? Why do you think this will be effective? Explain.
Post a new topic to the Discussion Board that contains your responses to the above questions.