Posts

Showing posts from November, 2017

8.3, due on December 1

The proof of the commutativity of the convolution was difficult to follow, with the change of variables and all. Also, it's hard for me to grasp the relationship between the Hadamard product and the convolution. It helped to see the example of removing low frequencies by using convolutions, but I'm still totally not clear on it. I had no idea you could convolve a function with a Kronecker delta to basically produce copies of it at different locations! That's pretty neat. And after reading Example 8.3.12, it makes sense how convolutions and component-wise multiplication can be used to edit a discrete signal.

8.2, due on November 29

I'm excited to see what cool applications there are for the Fast Fourier Transform, since it seems applicable to anything having to do with sensors in a system. I imagine all sorts of uses in algorithms that need use discrete measurements to predict what's going to happen in the future for continuous systems (like self-driving cars?). Something that doesn't seem intuitive to me is how the discrete Fourier transform is the coefficients of the projection of f onto some subspace. I'd like to see some sort of visualization to better understand it. I also don't understand why the primitive nth roots of unity become zero when the exponent is not a multiple of n. Also, why is the temporal complexity of the FFT O(n log n)?

8.1, due on November 27

Theorem 8.1.7 explains a sufficient condition for the Fourier series of a function to converge to it pointwise. I guess what it's saying is that at a discontinuity the function needs to be equivalent to the average of the two limits on either side. That seems intuitive. I'm confused by Example 8.1.8, in the part where the Fourier series of the function is given. I don't see how we made the jump from calculating c_k to getting the infinite series that we got. No calculations were given. From the reading it looks like Fourier transforms are to signal processing as eigenvalues are to other areas of engineering. They have lots of application in various fields. I'm personally interested in signal processing when applied to phonetics and speech processing, because that's a very interesting field that still has many unsolved problems.

Appendix B, due on November 20

I had never seen a definition of the complex modulus before, so it was good for me to see it rigorously defined. I also have never seen roots of unity explained in quite the same way. I used them a tiny bit in pre-calculus in high school, but only used them to talk about different complex roots of polynomials (I think?). I definitely don't understand the proof for Proposition B.1.6. I'm missing some intuition about Euler's Formula. I've seen graphical explanations that in the end have not explained anything. I'd like to hear what you have to say on the subject. It was also interesting to read about fields, and see the proof of Proposition B.2.4. The style of proof required was one I'd never used before.

7.2, due on November 20

The conditions for the universality of the uniform aren't obvious to me, because I don't see why it's necessary to have a continuous inverse to allow us to use F as our c.d.f. It's cool to see how you can choose any distribution you want to sample from as long as it's always above the distribution you actually want to use, rejecting values that are above the actual distribution. I wonder if it makes sense to talk about using distributions along the y-axis that aren't uniform. I suppose that would mess up the intuition of the probability equaling the area under the distribution, which might render the sampling incorrect. In Example 7.2.9 I didn't quite see how to choose f_R(x) like the writers did. Is there a formula for coming up with that distribution that was chosen in this example?

7.1, due on November 17

The idea of Monte Carlo integration is pretty cool. It made a lot of sense when the book explained how it was similar to a Riemann sum but with points randomly selected from a uniform distribution on [a,b]. That helped me understand the fundamental difference between Monte Carlo integration and other ways to compute the integral. I would like to go over in detail why the SEM has the equation it does. That was the least intuitive part of the section. I don't feel like I did so well on the test yesterday, and I've thought about how I can improve. I think my goals for the rest of the semester are to take time to ponder the reading, pay 100% attention in class, and proactively do the homework by myself each day and then come to the others to work on the problems I didn't understand. I think that'll give me a more solid foundational understanding of the content of each section, and I'll be more prepared for exams.

Ancestry DNA presentation

The speaker is a computational biologist who works on the Ancestry DNA project. Before coming to Ancestry he worked on statistical research in genetic mutations that lead to disease. There's been research that shows that large ethnic groups have different DNA that can be traced back to the origin, but even more recently some research has shown that differences can even be traced to small communities (e.g. regions of the UK), so we can get even more fine-grained accuracy. Another problem arises when we look at finding cousins based on DNA received from similar ancestors. I feel like it would be very difficult to recognize that two people are cousins just by DNA, since it dilutes so quickly. Also, I had no idea network theory was involved in DNA research. But it makes sense, since genealogy is just a huge network. They also used machine learning to accomplish classifications by DNA, which is really cool because I'd never thought to use machine learning for that!

Exam prep, due on November 15

I think the most important concepts from the section are the Central Limit Theorem and having a basic understanding of the meaning of the different equations for each distribution: p.d.f., c.d.f., etc. I imagine that on the test there will be quite a few questions asking us to recognize which distribution will be best for the given data, and applying it to find the probability of certain values. I really need to spend time memorizing the different equations for each distribution. I've got several of them down, but I don't know all of them (especially the ones for the section on Bayesian statistics).

6.4, due on November 13

The explanation of what Bayesian statistics is was really well put. It makes sense to not just talk about expected values for different parameters of a distribution but also the entire distribution of that parameter. I can see how to do the calculations in the discrete case, but it would be nice to work through an example of how to do it in a continuous case. To me, in seems like in the worked example of the continuous case we don't choose the Beta distribution for anything else but the fact that it can represent lots of different assumptions about what the unknown parameter probably is. This contrasts with my understanding of the different distributions from the previous chapters. In the previous chapters, each distribution had a nice example of something in the real world that was distributed in that way. This time, it seems like we're breaking away from that into just using the p.m.f.s and p.d.f.s as we like.

6.3, due on November 10

The central limit theorem is really cool, because until now I had no idea why so many real-life examples had a normal distribution. It's weird to think that the sum of any number and type of distributions eventually reaches a normal distribution. The statement of the theorem itself is a little confusing, and I'd like to go over how all the variables interact and what each of the equations mean. But on the whole I find it very interesting how powerful this theorem is. Also, I struggle to see why certain things are put into the normal distribution. Are they \mu and \sigma^2 for each distribution? If so, why? (E.g. N(np,np(1-p)) for the binomial distribution.)

6.2, due on November 8

The proof of Markov's Inequality didn't make a lot of sense to me, probably because I'm not very familiar with indicator functions. The result of Chebyshev's Inequality is pretty neat, given that it works for any distribution and is intuitive to think about. I don't have a good intuition for why the law of large numbers sometimes doesn't give us any information, as in Nota Bene 6.2.10. I was surprised by how simple the proof for the weak law of large numbers was. This could be very useful for determining how large we need our sample size to be to get the accuracy we want.

6.1, due on November 6

I had a hard time understanding example 6.1.5, because I didn't follow how the double sum over the combined estimate was broken down. It'd be good to go over why that's the case so I can understand how to break it down for other statistics. What did make a lot of sense to me was the maximum likelihood estimation, because I can see how the likelihood of some parameter is equivalent to the density of a random variable given that parameter. Generally though, I think it'd be nice if the introduction to the section contained a little more in the way of examples, so I could get a real-life context to these estimations and likelihoods. What's the difference between the maximum likelihood estimate and the estimator, if the equations look identical in every example? I'm excited to see how these methods are applied to machine learning.

5.7, due on November 3

From my reading I'm not sure how useful a multivariate random variable appears to me. It seems like basically we're just performing the same math on more than one random variable at a time. What would make it interesting is if we saw multivariate random variables that contained dependent random variables inside them. OK, I wrote the previous paragraph before reading the section on covariance. It makes a lot more sense to use multivariate random variables now... Some of the notation was confusing at the bottom of the first page, where it talks about the covariance in terms of the average value of the entire multivariate X. Also, I didn't follow how the covariance was reached from the expected value in Example 5.7.16, because I don't see E[X_B] or E[X_W] calculated anywhere.

LinkedIn presentation at Careers in Math

Sara Smoot Gerrard spoke today about data science and machine learning. She discussed a lot of the problems involved with the idea of "data science", something that hasn't been clearly defined to a lot of people. When companies want to "provide value" to their customers, they often don't know what they mean. It's a data scientist's job to refine that goal and decide what data can be used to improve the value of the product. It is often difficult to decide what features to use in a model, when there could be billions of features available and not all of them matter. For example, for discovering groups on LinkedIn they found it was better to limit the product to only give suggestions of groups that already had people connected to the user. To do well in data science, it's important to have the mathematical and statistical skills as well as solid coding ability. Without either one, a data scientist will not have the ability to conceptualize a so...

5.6, due on November 1

I'm wondering why we chose to require that the function be well-defined from negative infinity up. Why does it need to be defined so far left if we (maybe) aren't going to use those values? I think because of this confusion the equations for the c.d.f. aren't intuitive to me. It's intuitive to see that the probability density function is the derivative of the cumulative density function. It seems to me that that's almost by definition, since we define the c.d.f. as the probability over the interval (-\infty, k) and that almost seems like part of the definition of the FTC. The formula for the expectation of X is super obvious to me, thankfully. It's cool to see all these formulas I'd seen in statistics worked out mathematically.